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ABSTRACT 


■(/hile  there  are  general  sl;ailarltlea  among  the  different  kinds  of 
forced-choice  rating  foraia  reported  In  the  psychological  literature, 
there  are  also  haslc  aiffereacos  among  them.  When  the  Humin  Resources 
Research  Center  of  the  Air  Training  Command  vra.o  requested  to  develop  a 
now  rating  form  for  Air  Force  technical  school  instructors,  an  oppor- 
tunity for  a comparative  stmy  of  these  haaic  differences  was  offered. 

''^The  objective  of  the  present  study  was  a relative  evaluation  of  six 
different  kinds  of  forced -choice  performance  rating  forms  with  respect 
to  validJty,  roilablilty,  blaaabllity,  and  the  degree  to  which  raters 
liked  different  forme,  'ihe  kinds  of  forci-id-clKJlco  for’imi  ustjd  In  the 
investigation  varied  as  to  the  content  Ox  the  blocks  of  statements  and 
as  to  the  directions  which  were  given  the  raters.  

About  2300  ratings  were  obtained  from  tiie  six  Air  k’orce  bases  as- 
sociated with  tociinlcal  tr'.lain,g.  Itora-analysls  keys  were  developed 
using  throe  bases  as  one  sample  'md  the  other  three  bases  as  another 
aaraple  for  purposes  of  cross-validation,  ■/’alldities  wei’o  obtained  by 
correlating  scores  on  tho  various  foi’ms  with  proficiency  makings  as 
a.sslgned  by  instructor  nu])orvisor.?. . Hsin,'  t!io  avemge  validities  for 
five  different  scoring  key:;  which  wore  deveLiped,  validities  ranging 
from  .53  to  .69  were  obtiilned  under  conditions  of  cross-validation. 
Reliabilities  were  obt;ilned  by  usin;.-i  the  split-half  technique,  attempt- 
ing to  equalixo  the  two  halves  of  tlie  form  viih  respect  to  block  valid- 
ities. The  stepped-up  reliability  coefficients  ranged  from  .68  to  .96. 
When  these  were  further  adjusted  to  .uiko  all  fona;  equivalent  ir.  length 
to  the  longest  form,  the  resulting  reliability  coefficients  ranged  from 
.91  to  .97. 


Of  tile  six  fona,  used  in  this  projoct,  n fom  made  uj.  of  blocks 
contcinliig  four  favorable -appear  in-'  stat-  . ts  from  whlcli  the  rp.ter  was 
asked  to  choose  the  two  which  wore  most  deici'Iptlvo  of  t}u‘  ratoo  gave 
genemilly  superior  result;  . This  form  had  highest  average  validity 
(.68)  and  satisfactory  reliability  (.93),  wa;.  least  susceptible  to  de- 
liberate attempts  to  give  h!»-h  scoi'ef',  >ind  v u;  v>ne  of  tho  t wo  forms  best 
liked  by  tho  raters. 


Foimuj  made  up  ’f  blocks  ni’  four  oi’  five  s titonents , of  which  sane 
appear  favorable  ana  soi.m  unitivomblo , am  ffom  which  idle  rater  la  asked 
t<i  choose  the  most  descriptive  sUiteiiuint  'int:  tiie  Itun  t doLUTlptlvo  state- 
ment, tvive  been  widely  us'kJ  in  .mlllt'iry  ana  industrlu.1  situation^.  In 
tho  present  oxporlmont  two  fonriS  constructou  in  tiilii  manner  showed  a 
relatively  atror^-:  tendericy  t:)  praiuco  negatively  ..ikewod  distributions 
when  the  mtors  were  told  t'  give  as  high  a score  a.n  possible.  Tho 
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Talldltles  obtained  for  these  two  forsns  were  the  lowest  ( .53  fi^nd  .56)  of 
those  forms  tested,  although  the  forma  yielded  the  highest  reliabilities 

( .96  and  .97). 


C one lus lone 


"^Of  the  forma  used  In  this  experiment,  those  In  which  the  blocks  were 
conq>08ed  of  four  favorable -appearing  statements,  from  which  the  rater  was 
to  choose  either  the  two  most  descriptive,  or  the  most  and  the  least  de- 
scriptive, were  generally  superior. 

The  Incliislon  of  both  favorable  and  mfavorable  statements  in  the 
same  block  appears  to  be  an  inferior  method  of  constructing  forced -choice 
forms . A 
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A MSTHQDCfLOGICAl..  STUDY  OF  FORCED-CHOICE  PERFORMANCE  RATING 


INTRODUCTION 


It  is  a part  of  the  American  culture  that  indlvldunla  should  he  re- 
warded for  effectlvonass  in  doing  a Job.  It  la  also  a part  of  our  con- 
cept of  efficiency  that  individuals  whose  performance  Interferes  with 
the  effectiveness  of  an  organization  should  he  ollm.lnatod  from  that 
organization.  Such  a system  of  reward  and  punishment  Is  easy  to  apply 
If  a man’s  work  can  ho  evaluated  In  terms  of  the  number  or  quality  of 
units  produced.  In  most  situations,  however,  the  objective  evaluation 
of  Individual  performance  is  either  not  possible  or  oitremely  difficult. 
In  such  cases  evaluation  usmlly  becomes  a natter  of  a supervisor's 
over-all  Judgment  of  how  well  an  individual  does  his  Job.  In  the  past, 
this  supervisory  Judgment  of  worker  effectiveness  was  most  commonly  made 
on  an  informal  basis.  If  the  supei^iaor  thovight  a man  should  be  pro- 
moted, demoted,  or  eliminated,  he  either  took  or  recommended  the  appro- 
priate action. 

With  the  growth  of  industiy  aixL  the  parallel  growth  cf  personnel 
departments,  it  becamo  apparent  to  some  personnel  workers  that  a more 
formalized  or  more  systematic  procedure  for  evaluation  of  worker  per- 
formance was  needed.  It  seemed  logical  to  assume  that  if  supervisors 
woi’e  required  to  be  analytical  in  their  evaliuition  the  results 

should  be  more  valid.  Thus,  procedures  wore  developed  for  obtaining 
measures  of  the  extent  to  wliich  the  wcrkei’  possessed  each  of  a number 
of  traits  which  wore  considered  by  monagomont  to  be  related  to  success 
on  the  Job. 

In  this  connection,  one  of  the  more  interesting  research  products 
of  World  War  II  was  the  forced -choice  performance  rating  method  devel- 
oped in  the  Personnel  Research  Section  of  the  Adjutant  General's  Office. 
Officer  ratings  obtained  on  forced -choice  rating  foms  were  reported  to 
be  substantially  less  biased,  and  more  valid  than  ccciparable  data  ob- 
tained by  other  rating  ruothods  (?5,  OO,  99 )•  Subsequent  to  the  war, 
the  development  of  forced -choice  rating  forms  for  Industrial  Bupervisors 
has  been  favorably  reported  by  Richardson  (66),  and  Seeley  (77)  reports 
ouccosD  in  the  use  of  the  method  in  canstnicting  rating  foms  for  Naval 
Air  Ground  School  Inotnictoi's . 

While  the  general  procedures  for  the  dovelopmt.nt  of  forced-choice 
rating  forma  in  the  reported  experimonts  are  slm.ilar  to  those  prevloiisly 
omployod,  certain  basic  differences  in  methodology  are  appajcont.  When 
the  Hunan  Resources  Research  Center  of  the  Air  Trsvlnlng  Conmnd.  was  re- 
quested to  develop  a now  rating  form  for  Air  Force  technical  school 
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Instructorb,  the  opportunity  to  attempt  to  get  anevars  to  cei^:;aln  method- 
ological questions  was  presented.  This  bulletin  reports  the  findings  of 
the  resulting  investigation.  Moreover,  since  forced -choice  rating  meth- 
ods are  of  interest  only  If  they  can  bo  shown  to  be  superior  to  other 
methods,  in  the  folioving  two  sections  of  this  report  research  findings 
on  forced-choice  are  compared  with  those  reported  for  other  rating 
methods . 
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CONVENTIOm  RATING  PROCEDURES 
General 


A conventional  rating  procedure,  as  the  term  will  be  used  here,  is 
intended  to  mran  an;y'  rating  procedure  which  requires  the  rater  to  indi- 
cate where  the  individual  being  rated  stfinds  on  one  or  more  good-poor 
dimensions.  This  indication  may  bo  made  by  checking  one  of  several 
statements  which  describe  various  degrees  of  a given  trait  or  it  may  bo 
accompllahed  by  making  a chock  mark  along  a line,  one  end  of  which  la 
identified  with  possession  in  high  degree  and  the  other  possession  in 
low  degree  of  the  tmlt  or  behavior  being  rated.  The  simple  ranking  of 
employees  according  to  their  over-all  merit,  while  widely  used,  is  not 
included  here  as  a conventional  rating  procedure.-^ 

All  conventional  rating  procedures  have  one  thing  in  common;  it  is 
possible  for  the  rater  to  tell  whether  a given  check  mal*k  on  the  rating 
form  la  going  to  have  a favorable  or  ma  ’onfavorable  effect  on  the  rates 's 
total  score. 

How’ever,  those  conventional  rating  forms  have  a number  of  advan- 
tages. They  are  simple,  and  hence  easy  to  construct.  They  require 
little  motivation  on  the  part  of  the  rater.  They  can  be  quickly  filled 
out  and  easily  scored.  Also,  claims  are  made  that  they  provide  a con- 
venient rusdlum  through  which  supervisors  can  deal  more  effectively  with 
their  subordlmitos.  The  supervisors  are  said  to  be  obliged  to  observe 
their  workers  more  closely.  The  competed  forms  can  be  used  by  the 
supervisor  in  discussing  workers'  strong  and  weak  points  with  them. 

’ifThlle  these  claims  seem  reasonable,  the  actual  effect  of  the  use  of 
conventional  rating  procedures  on  production  or  morale  has  yet  to  be 
adequately  evaluated. 


whore  only  a few  employees  arc  to  bo  rated,  or  for  experimental 
use,  ranking  mtiy  provide  an  economical  aid.  rcO.atdvol^-  satisfactory  way 
of  rating  porsoimel.  In  mitiug  situations  involving  a number  of  small 
groups,  however,  the  average  ability  and  the  dlstrlbiition  of  abilities 
within  groups  may  differ  suf'^'iclontly  that  the  same  numerical  ranks 
from  different  groups  reia-osunt  widely  different  capabilities. 


There  la  not  nnl^ei'oal  agreement  as  to  the  dealrahlllty  and  effec- 
tlveneaa  of  rating  procodaroo.  Harrell  (30)  says  that  published  doscrlp- 
tlcjns  of  rating  eyutemii  are  like  published  deacriptlono  of  bridge  heinds 
in  that  they  usually  report  onlj  the  Instiincos  In  which  the  system  worked 
successfully.  Poctovioe  (57)  states,  '’With  the  possible  exception  of  In- 
dividuals who  are  prlia.i.rlly  interoatod  in  selling  their  own  pet  rating 
schemes,  there  la  fairly  uniform  agreement  that  efforts  to  meaotire  the 
employee's  eorv^c  vodue  satisfactorily  h'.ve  been  relatively  imsuccess- 
ful." 

A more  detailed  analysis  of  the  weaknesses  of  conventional  rating 
procedures  may  be  best  presented  under  the  following  hoadlngs:  Validity, 
Reliability,  Error  of  I.enloncy,  Halo  Effect,  and  Other  Aspects.  The  em- 
phasis in  this  discussion  will  be  on  the  use  of  conventional  rating  pro- 
cod urea  for  teache^r  evaluation. 


Validity 

Validity  cooff iciout.i  for  rating  foma  are  not  always  presented  in 
the  publications  which  describe  their  \iiie.  The  reason  for  this  dearth 
of  validating  statistics  Is  not  hard  to  flrxi.  If  the  situation  In  which 
the  rating  fomuj  are  used  were  such  that  valid  production  records  or 
work  samples  were  available  as  criteria,  it  probably  would  not  be  neces- 
sary to  use  the  rating  forme.  Nearly  everyone  would  agree  that,  In 
general,  measures  of  production  are  to  bo  preferred  ovor  subjective 
ratings.  It  la  in  those  situations  In  which  production  measures  are 
lacking  that  the  principal  use  his  been  made  of  merit-rating  devices. 
Hence,  there  is  seldom  a criterion  agaliust  v^ich  the  validity  of  the 
rating  scores  can  be  checked. 

Psycholot^lsts  and  others  who  have  been  associated  with  the  use  of 
rating  forms  have  fourui  it  tempting  to  assume,  in  the  absence  of  vali- 
dating inforrratlon,  tiiat  the  scores  vhl''h  are  yielded  by  the  rating  forms 
h/ive  validity,  lliis  is  on  assumption  that  j 'iB  investigation,  despite 
the  apparent  roasoivibleness  of  the  idea  tlia!  i supervisor's  opinion  of  a 
worker's  offoctlvonoss  should  bo  highly  b. tod  to  the  worker's  actual 
effectiveness . 

A number  of  Bubstltutos  for  validation  against  pr  xiuctlon  criteria 
arc  doscrlbod  by  lj-‘iv''r  (b-)-  They  include:  comiviriBon  of  scores  ob- 
taixiod  fron  rating  funos  with  those  cbta.lned  from  psycliological  tests, 
or  with  thoot!  obtiiiuixi  from  other  ratln,-  devices;  aiyilysls  of  the  dis- 
tribution of  ifitli^gs;  oi  - luclyuls  of  tlie  pro.'.onco  or  boence  of  halo 
effect.  In  addition  b those  throe,  supervisor  ami  poor  rankings  have 
also  been  used  a;:  cidterii.  While  certain  inforencea  can  bo  drawn  from 
the  results  of  these  pr.  cod  urea,  they  do  not  pemlt  the  prediction  of 
the  relationship  which  exists  between  n-tlru-';  actu’es  .'md  productive  ef- 
fect Ivonoas  on  the  Job. 


I 


3 


In  dlscioiJsInjK’  validity  of  ratin,^  foiHL.!,  Cooper  (I3)  states,  . . it 
can  be  said  that  it  ic  possible  to  develop  I’iitirp.’  forma  that  are  statis- 
tically reliable,  but  tliat  one  caiuiot  liolp  doubtljit;  their  validity  as  de- 
vices for  moas^U’ii'Vr;  occupational  proficiency  or  other  performance." 

Butsch  (12),  with  respect  to  ratlni-ps  of  tjeneiTil  teaching  ability,  agrees 
that  "Correlation  stuuies  have,  in  geiier-l,  failed  to  reveal  any  signifi- 
cant relationr.hip  bo  tween  [i-atings  of]  gonoicil  teaching  ability  and 
traluilnti,  scholarship,  intell igonco , exporienco,  age,  salary,  credits 
eamed  or  profess lanal  toots."  Although  this  generalization  does  not  have 
a direct  bearing;  on  the  relationship  botwoon  ratings  of  teaching  abllliy 
and  "actual  teaching  ability,"  it  tends  to  throw  suspicion  on  the  validity 
of  such  ratings,  hinudsen  and  Stephens  (i^l)  conclude,  from  a survey  of 
teacher  rating  devices  which  wei'e  being  xised  in  school  situations,  "In 
moat  instances  validity  of  the  device  is  Ir^illed  in  the  assumption  that 
those  who  fumlshod  the  items  included  were  competent  to  select  those 
traits  that  mke  up  teaching  effectiveness."  This  assimptlon  is  probably 
Just  as  prevs-lent  Iji  other  sltmtions  in  whicli  ratings  are  mode  as  it  is 
in  teaching  situations. 

A series  of  studies  conducted  at  tlio  ITniversity  of  Wisconsin  was 
concerned  with  the  general  topic  of  the  measurement  of  teaching  ability. 
These  studios  are  of  special  Interest  here  because  they  deal  specifically 
with  the  relationship  between  student  growth  in  subject-matter  performance 
and  various  Judgments  of  teacher  effectiveness.  In  one  of  these,  Rolfe 
(68)  st\idied  student  learning  in  connection  with  two  three-week  units  of 
work  in  the  social  studios  in  one-  and  two-room  rural  schools.  Ho  found 
that  rating  scores  as  assigned  by  "oxporioncod  and  canpetent  supervisors" 
wore  correlated  .36  to  .43  with  pupil  growth.  Rootker  (70),  in  another 
study  dealing  with  student  learning  of  social  studios,  found  that  the 
relationship  between  scores  on  supervisory  ral.in/;;  scales  and  teaching 
ability  as  indicated  by  student  progress,  Insignificant.  Ho  concluded 

tliat  such  ratiig  sc.-xlea  should  be  uy  d onlj-  with  much  discretion.  In 
connection  with  a factor  analysis  of  the  Rolfe  and  Rostker  data, 
Hellfrltzsch  {3I)  concluded  that;  "teacher  rating  .scales,  although  fre- 
quently used  to  evaluate  the  effectiveness  of  a teacher,  are  only  slightly 
related  to  observed  pupil  growth  in  social  studies.  The  relationship 
does  not  appear  to  be  largo  eii.nigii  to  warrant  using  supervisory  ratings 
as  a spcondaiy  criterion  in  studio..-,  .lealing  with  tiio  measurement  of 
teaching  ability,  where  toachiig  ability  is  proporlv-  conceived  in  terms 
of  the  ability  to  pi-oiuote  pupil  growth."  LtiDuke  (42),  in  another  of 
■tiie  Wisconsin  studies,  also  found  thiit  ratineis  by  superintendents  ajid 
oupervlaiig  toachei-s  did  not  agree  with  the  criterion  of  student  learning. 

Baird  ajid  Bates  (4)  studied  the  !’atliv.^s  by  128  principals  of  571 
teachers  in  Detroit.  Piipil  growth  was  detemined  by  means  of  point 
scores  baaed  on  the  .nor;n.r  of  cort^iin  s bind.aiti  i zed  tests.  A correlation 
coefficient  of  oaly  .135  wjis  foiuiil  between  tiio  ratings  and  this  criterion. 
Taylor  (88)  fomid  that  the  correlation  between  estimations  of  teaching 
ability  of  instnictors  and  progress  ■-'f  pupils  in  ro;uilng  and  arithmetic 
was  very  allf-Jit.  Pup^l  progi-osa  in  -i-rltiuno tic  correlated  .018  with  the 
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toachor's  estiimtod  ability  to  toach  arltlunetic,  whilo  px-ogress  in  read- 
ing correlated  .2i»l  with  ostlrjfitod  ability  to  teach  reading. 

In  a non-xeachini-j;  situation  Stockfoi’d  c.nd  Bissel  (36)  found  the  re- 
latively lev  cori'olation  of  .22  between  ;ui  objective  rji'asure  of  work 
performed  by  neclianl.es  under  various  super\'i3ors  aiid  the  ratings  of  the 
supei'viBors  by  their  aop»irtnent  heads.  In  contrast  with  this,  the  length 
of  time  that  the  dopai-tnedt  hoads  lixid  iuiown  each  I'f  the  rated  supervisors 
correlated  .59  the  rated  social  atinulus  value  of  the  personality  of 
the  supervisors  being  rated  corrolatod  .op  with  the  rating  scores.  This 
would  suggest  the  possibility  that  mtln<^  scores  do  a better  Job  of  in- 
dicating the  social  acceptability  of  the  one  rated  than  they  do  of  rating 
his  Job  perfornanco. 

From  the  information  which  has  been  presented  here,  certain  general- 
izations arc  poaniblo  with  respect  to  the  validity  of  performance  ratings 
in  general,  and  of  those  of  .Instractora  in  particular.  These  studies 
suggest  that  all  ratltv’s  need  to  be  considered  .suspect  until  the  rela- 
tionship between  the  ratln<-’n  ainl  objective  raeasures  of  Job  performance 
has  been  demon.'- tra tod.  Thu  daha  stroi.<.;ly  .u.ig;;,cst  thit  thoro  is  very  lit- 
tle relationship  between  ratings  n,s  assigned  by  public  school  supervisors 
and  Instructor  perforcvuice  as  moiasurod  in  teim.”  of  student  learning. 


Holiab il Ity - -Rvter  Agreement 

With  respect  to  xviting  forms,  sovoral  kinds  of  reliability  measures 
are  possible.  Most  commonly'  the  reliability  which  Is  given  in  the  pub- 
llshod  description  of  a i-atlng,  form  is  cither  a correlation  between 
ratings  and  roratings  of  the  same  individual  by  the  same  rater,  or  the 
coiTelation  botvecn  the  scores  assigned  by  different  raters  using  the 
sfiiae  forai.  While  it  is  possiblo  to  compute  the  aid-oven  reliabilities 
of  moat  conventional  rating  devices,  this  ctatlstic  is  not  usually  I’e- 
ported  In  the  toclmlcal  doocrlption.a  of  the  rating,  forra. 

.'^.udeen  and  otopheno  (i^l),  in  a atud.v  of  raports  of  5?  I’ating  de- 
vices used  in  aoa.;!.emio  s i tint  ions , fouiii  that  for  kO  of  those  devices  no 
evidence  of  rollnb! lity  wf’.s  presented. 

Butsch  (12)  presents  reliabilities  foi’  same  r’.tor-same  scale, 

same  rater-different  scale,  and  different  rator-sairn  scale.  In  some  of 
the  cases  the  coror.'lation  coefficients  are  in  the  .^'O's,  but  correla- 
tions in  the  .oO';:  ind  .70's  ai*o  .’’loro  tyjii-'a!.  In  an  industrial  situ- 
atl.-n,  Drlvoi’  (lo)  found  tlxit  the  corrolutione  of  year-to-year  ratings 
by  tjio  same  r". . .a  Ik  traits  rarv.;ed  f'-om  .59  to  . -'t . Paterson  (56) 
f ■-'U.-id  month-t'  ’-.mont):  rating.-  .-'f  the  same  ..'orkor:>  by  tlu-  same  foromm  to 
con'olato  from  .7'..,  to  .37.  Correlation  coefficients  between  different 
foremen's  ratln,-'s  f the  same  workers  rai-tvca  frex;  t.)  .JO. 
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Cooper  (13)  reports  a correlation  of  .^8  between  the  ratings  of  de- 
paiiaaant  store  saleapersons  nade  by  two  Judges.  He  foani  ratings  of  In- 
terrlewors  by  their  two  Immediate  aupervlsoro  to  correlate  ,76.  ITie 
author  concludes  that  (a)  the  reliability  of  the  Inctrxurent  depends  not 
so  much  upon  the  form  used  as  upon  the  situation  In  which  It  has  been 
eD5)loyed;  (b)  the  same  kind  of  rating  blank  varied  In  reliability  when 
used  In  situations  involving  different  kinds  of  workers;  and  (c)  aliq)le 
over-all  rating  proved  to  be  as  reliable  as  an  elaborate  rating  fom. 

It  le  difficult  to  generalize  from  the  published  reliabilities  slr^^e 
these  may  not  be  a random  sacipj.e  of  the  rellp-bllitles  of  rating  fonus  Li 
general.  The  consideration  of  the  evidence  on  the  reliability  of  rating 
forms  should  not  cloud  the  issue  with  respect  to  validity.  If  the 
ratings  lack  validity- -and  the  possibility  that  this  is  often  the  case 
is  strongly  eusgeatod  by  the  Infomution  v/hich  is  available --the  aac\ir- 
ing  of  high  reliabilities  lias  very  little  mciaulrv-'. 


Error  of  Leniency 

Guilford  (23)  defines  "error  of  leniency"  as  "a  constant  terxiency 
that  many  raters  have  In  common  ...  to  rate  all  individuals  whom  they 
know  above  average  In  certain  traits."  Kheoland  (39)  uses  the  term  to 
describe  the  tendency  of  raters  to  rate  well  above  the  midpoint  of  the 
scales  used.  The  midpoint  in  this  case  is  intended  to  be  identified 
with  the  average  individual.  As  used  here  the  term  "lenient"  will  have 
the  some  meaning  as  given  to  It  by  fiieeland. 

Butsch  (12)  states  that  moat  raters"’“rate  too  high  and  that  this  has 
the  effect  of  producing  badly  skewed  distributions.  Richardson  (66) 
points  out  that  the  tendency  to  over-rate  apparently  increases  for  ovoiy 
year  that  a graphic  or  variant  of  the  graphic  system  la  in  opera! ion. 

Fry  (25)  describes  the  situation  which  existed  prior  to  the  time 
that  the  Arnyr'a  earlier  general  efficiency  rating  was  replnced  by  tlie 
forced-choice  Form  67-I.  The  former  rating  placed  over  93  per  cent  of 
the  officers  In  the  excellent  categories.  Forty -nine  per  cent  of  those 
wore  in  the  highest  or  superior  bracket  wltli  oitly'  a little  over  1 per 
cent  in  the  middle  (veiy  satisfactory)  group. 

Stockford  and  Blsaol  (36)  found  thiit  ratings  tended  to  fluctuate 
according  to  whether  or  not  the  supervisors  wore  required  to  discuss  the 
ratings  with  the  iridivlduala  rated.  When  the  regular  company  method  of 
sending  the  ratings  directly'  to  the  pei'sonnol  department  without  dlscuc- 
elons  with  the  Indlvldudls  rated  was  used,  the  mean  score  on  a scale  of 
100  was  60,  with  a standard  deviation  of  2]  . '.ihen  experimental  ratings 
were  conducted  two  weeks  later,  with  the  ratiji;?  to  bo  discussed  Kith 
those  rated,  the  mean  rating  increased  to  34,  with  f'  stendard  deviation 
of  l4.  This  difference  of  24  points  between  the  tv^  means  was  highly 
significant  (C.R.  r 20). 


pj^noliuni  (3!-0  oh  iv/oa  Ui  ■ t tho  t(jiu.luu>-y  to  bo  lenlont  oxieta  ovon 
whi.:ii  the  rater  hae  no  apjvu’ont  reason  for  boln^j  lenient.  Fourteen  hun- 
dred cuatOTiera  vero  a.iked  to  into  aalon  olorkn  or.  I’ive  qmlltloa  related 
to  Job  porfoi'iTince  (intoi-uet  in  custoMoi",  merciuiiiiiiao  l.aformtion,  dla- 
pbay  of  ’nta'clui-ndi..” , O'.n.u’teoy,  and  alortnea.'i ) . nie  ou.it.iiners  rated  the 
iuileapooplo  on  a 10-pojni  scale  on  onch  trait,  Tlio  'verago  rating  on 
■ill  tr’its  «"aa  6.H9.  dovonty-five  profeac.ional  aln'ippor:!  gave  a moan 
rating’  of  6,01  to  the  a.'U'.n  •■■alos  clerko. 

Eichiiidaon  (66)  pointa  out  that  aome  ratero  ovoi'-rate  more  than 
otlioro , with  tho  I'cjult  thit  their  rati]i,-gc  are  not  comparable  with  those 
niado  by  other  aiiporviaors  or  oxocutlveo.  Stockfora  and  Bisael  (86)  found 
that  dlfforoncofi  in  loniency  between  ratora  wore  00  groat  that  all  of  the 
omployeos  working  for  tho  four  most  severe  raters  were  rated  lower  than 
tho  poorest  ratings  given  by  tho  two  moat  lenient  raters.  Tiffin  (92) 
uccoiuits  for  dopartmontal  differences  in  mean  ratings  in  terms  of  actual 
discropanc  ioo  in  laerit  a;id  differoncos  in  standarun  or  interpretation. 

He  ai^igests  that  employee  ratiig’s  should  be  ■compared  onl;,  with  those  of 
other  employees  from  the  s;uiio  dop<irtment  rather  than  with  ratings  ob- 
bainod  from  the  plant  as  a wliole.  Tiffin's  point  thit  the  differences 
in  mean  ratin^.;s  of  different  dopariments  may  be  partially  accounted  for 
by  actual  differences  in  tho  merits  of  the  employees  is  acceptable  on 
logical  gr'ounds.  However^  in  view  of  tho  questionable  validity  of  most 
rating;!  scores  the  relationship  which  exists  betw-oon  Interdepartmental 
discrepancies  in  merit  rating;  scores  and  in  the  qualities  of  Job  per- 
formanoo  must  renmin  Indotoniiinate . 

Evaiuj  (18)  sditeo  that  tiio  validity  -f  me^’lt  ritiryd  is  likely'  to 
be  Jeop.ardizod  'whoaover  t-.itcrs  react  omotiomliy  to  ranothing  In  tlio 
ratin^-;  situation,  iiioso  oniotioual  reactions  on  the  pai't  of  tho  raters 
probably  have  tl’.e  effect  .T  raising  the  scores  of  the  iruilvlduals  rated. 
L'v.atis  believes  tiv.t  tho  following  kinds  of  feelings  on  the  part  of  the 
rater  imiy  affect  tiio  maiuier  in  wiiicli  ho  r.i.too:  (-'a)  Feelings  concerning 
his  ijiad'squacy  to  m;iko  tho  appraisal,  ili’s  would  incluile  insufficient 
knowledge  of  tho  pr'occduros,  ■ >f  tho  po'tTo--unco  of  some  or  all  of  the 
ratcc.r,  or  iunbllity  to  rate  an  omployec  ’ii  some  of  tho  rating  factors. 
(b,t  Feelings  of  d'Vubt  concern iivi  tho  fair  ■ 'ina  accuracy  of  the  rating 
rtioth  d.  This  would  include  tho  convlcti;  hat  tho  picture  will  bo  dls- 
torti.'d  by  some  stitisLIcai  means,  by  tho  - Luslon  of  i.-.’iportant  attri- 
butes or  the  inclusion  of  'uiiiaporb'uit  ..;v  . , In  addition  he  may  lack 
knowledge  of  tho  consi.-qui^aces  of  ills  ciioosin^;  a phrase  as  "most  typical" 
or  "least  typical",  (c)  Feoliiq-is  of  suspicion  about  what  may  happen  to 
him  (the  rater)  as  u result  of  tiie  ratln^^s.  (d)  Fodj.-igs  of  concern  for 
what  may  happen  to  his  people  an  n result  of  tiio  ratin,-.;s. 

IliUB  there  seems  to  be  little  doubt,  in  most  rating  eltuatlona,  of 
the  presence  of  tendencies  to  rate  leniently.  Those  undoubtedly  have  the 
effect  of  lowering  tho  validities  of  the  ratings  obtained.  Tho  construc- 
tion of  rating  devices  In  such  a way  as  to  minimize  tho  leniency  phenome- 
non would  seem  to  be  a desirable  objective. 
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Halo  Effect 


Thorndike  (91)  has  stated  that  even  a very  capable  foieman,  employer, 
teacher,  or  department  head  lo  unable  to  treat  an  individual  as  a compound 
of  separate  qualities  and  to  assign  a magnitude  to  each  of  these  qualities 
independently  of  the  others,  Rugg  (71 ) has  sugf'ostod  that  this  Inability 
to  rate  separate  qualities  results  from  tendencies  to  rate  or  Judge  men  in 
terms  of  a general  mental  attitude  toward  them  and  the  damlr,fl.tlon  of  this 
mental  attitude  toward  the  personality  as  a whole  over  attitudes  toward 
particular  qxialltlos. 

This  phoncnienon,  which  immlfests  Itself  in  intorcorrelations  between 
ratings  assigned  to  supposedly  separate  trjiits  and  in  correlations  between 
trait  scores  arA  total  scores,  loas  como  to  be  imown  as  the  "halo  effect." 
Richardson  (66)  points  out  that  with  conventional  rating  procedures  it 
does  not  rm\tter  which  or  how  many  traits  or  anpocts  of  boliavior  are  listed, 
since  general  hilo  ao.bjs  IL  ii>'at'ly  iniposs iblo  to  gi:t  o-  clear  picture  of  a 
.mn's  strong  and  weak  points.  Ho  believes  that  hilo  effect  res'ults  from 
the  failure  of  convent  Join  1 rfitJng  procediux-s  to  eepar''.te  the  reporting 
of  work  performance  from  the  evaluation  of  thit  perfommee,  and  that  the 
evaluation  of  work  pcrfonr.-.nco  oh.'iuld  therefore  be  established  by  statis- 
tical means  instead  cf  being  left  to  the  Indlvlduaj  rater's  Judc3ment  or 
caprice.  Ho  stiitos  that  the  tendencies  to  over-rtile  and  to  evince  bias 
for  or  against  an  Inalvldual  employee  are  so  deep-seated  that  psychome- 
tric techniques  must  be  set  up  to  countei’act  them. 


Other  Aspects  of  Ccnveationa.l  l-tetiiig.  Procedures 

Arbitrary  selection  of  c:~.tits.  Conventional  rating  foms  usually 
Include  provisions  for  n’-tln,'-,  on  a number  of  tiaits  or  categories  of  be- 
havior, Actual  anulyacs  of  T.iie  Job  behavior  expected  of  a man  have  sel- 
dom been  made  aiid  testc-d  out,  Barr  (6),  from  an  analysis  of  209  teacher 
rating  scales  from.Uo  states,  concluded  that;  (a)  a great  variety  of 
terms  are  used  tc  charactei'ixe  teaching  and  teaching  ability;  (b)  items 
are  generally  highly  subjectlvi.  a.nd  ill-defined;  (c)  content  and  organi- 
zation vary  widely;  (d)  social  and  personal  traits  surpass,  both  in  fre- 
quency and  consistency  of  mention,  all  other  tinlts  enmteraced  in  the 
study.  Knudsen  and  Stephens  (I'l)  foiuid  that,  in  the  majority  of  the  57 
teacher  rating  devices  suia’eyed,  the  method  used  for  selecting  traits  to 
be  rated  was  either  individual  Jud^iment  or  the  selection  of  items  from 
other  forms. 

The  use  In  rating  scales  of  traits  or  categories  of  behavior  other 
than  those  which  are  reality’  typical  or  pertinent  for  a particular  Job 
may  bo  one  of  a ir.imbor  of  factors  which  make  for  lower  validity  of  rating 
scores.  Flanagan  (23)  has  attacked  this  problem  by  use  of  the  "critical 
Incident"  tochiilque.  This  technique  includos  uho  collection,  from  quali- 
fied observers,  of  rtjporta  of  act'jal  incidents  of  extremely  effective 
or  extremely  Ineffective  Job  behavloi'.  Those  Incidents  are  assessed 
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as  to  relative  f i\;q.ueiiclo3  of  occurrence  and  decree  of  "criticalness." 
From  the  resulting  data,  rfttla-:  scales  that  cover  only  the  significant 
aspects  of  Job  behavior  cun  bo  cone  true  ted. 

Recall  of  pei-tijiont  behavior.  If  a supervisor  in  to  do  an  effective 
Job  of  rating  a suboidlnate  on  a certain  catogoiiy  of  behavior,  he  must  be 
able  to  recall  all  the  i.mpoi’tant  employee  perfon.Tancen  which  are  related 
to  this  behavior  category.  He  mi:i3t  tlion  evaluate  each  of  these  perform- 
aiicos  and  arrive  at  a quantltetlve  suj.amiy  of  them.  Obviously  this  is 
very  difficult,  if  not  Impossible,  for  the  avemge  supervisor,  or  for 
uii^/one  else.  This  ai\^ies  in  favor  of  Flanagfin'a  (23)  use  of  statements 
of  specific  behaviors  rather  than  general  traits  In  the  construction  of 
rating  forms.  It  also  Indicates  the  need  for  continuous  observation  and 
recoixiing  by  the  supervisor  of  beliavlor?.!  events  which  might  affect 
ratin-.js . 

Amount  of  training  time  required.  When  rating  procedures  are  found 
not  to  work  as  effectively  as  is  expected,  it  seems  to  be  common  practice 
for  the  sponsors  of  the  rating  form  to  explain  the  failure  In  terms  of 
lack  of  training  on  the  part  of  the  raters.  Richardson  (66)  states; 
"Although  training  of  raters  has  improved  the  qmllty  of  ratings  to  some 
extent,  it  seoms  evident  that  the  conventional  rating  scales  demand  too 
much  training  time.  Acceptable  ratings  have  been  obtained  only  as  a re- 
sult of  continuous,  expensive  training  of  raters.  Actually  the  shoe  is 
on  the  other  foot--a  good  rating  procedure  should  not  only  require  a 
minimum  of  training,  but  should  in  Itself  be  a good  training  device." 


jjunmviry 

1.  A convent Ir,!!;!!  rntin^j  procedure-  has  heem  defined  as  one  in  which 

the  rater  can  ident'fy  t.lie  good -poor  d.i  mens  ions.  The  obviousness  of 
these  diiaonsioiij  i.nkes  it  possible  for  the  rater  to  toll  what  effect  a 
;.:ivon  chock  rtirk  on  the  ratinj  fonn  will  kivo  on  the  ritee's  total  rating 
score.  Ihe  intci'  i--,n  rals.e  or  lower  the  I’atou'o  total  score  as  he  seas 
fit.  ' 

2.  The  validity  of  rating  foms  I \ot  nsu/illy  prasented  in  the 
published  doocrif  tlons  of  the  foriis.  Oi.j  reason  is  that  in  those  situ- 
ations In  which  ratiijg  Is  used  there  la  seldom  an  adoqiwte  standard  with 
which  the  rating  r.core.^  can  he  ccjnpai'Gd.  In  the  case  of  Instructor 
ratli^v  it  has  h'-eii  iios.-ible  In  some  cas  .s  to  compare  supervisor  ratings 
with  gains  In  ; \ibj(.!'.t-;:»i.tti,'r  knowledge;  acquired  by  students.  These  ro- 
latlonnhips  are  cli.-racteiMetlcally  voiy  slight.  Thoiv  is  some  «vldonco 
to  Indicate  that  rallng  : : nwj  i.i'iy  be  inorv  aloiwly  related  to  the  social 
acceptability  of  the  pei’con  rated  thion  It  is  to  his  Job  proficiency. 

3.  Govoii'l  kliKi.,  >f  reliability  coefficients  have  boon  reported  in 
connection  with  tudios  of  ratlrvi  fonn.o.  A high  degree  of  relationship 
b-' tween  the  ratir\.;s  Ur;t  trfo  different  ratei's  give  a weirkor  is  probablj' 


more  Important  th-ui  a hi^h  dotiree  of  relatlonahlp  between  two  ratin:^;o  by 
tlie  same  rater.  Adequate  rcllabnitloo  arc  aometimea  but  not  alwaya  pre- 
sent with  conventional  rating:  pi'oceduroa.  Adequate  reliability,  however, 
cannot  be  accepted  as  a substltuto  for  adoquat.e  validity. 

U.  A widely  prevalent  jihonomcnon  in  coiuiection  with  rating;;  has  been 
called  the  error  of  leniency.  'Jhic  roforn  to  tin'  terKioncy  to  (^ivo  above - 
average  ratings  to  the  inajority  of  those  iv'-tod.  Such  a tendency  my  im- 
pair validity.  Cvor-rs-.t ing  has  been  obsoi’'/od  luador  a wide  variety  of 
conditions  but  is  said  to  be  more  oiaggemted  if  the  rater  is  in  doubt  as 
to  what  effect  the  rating  is  frolng  to  have,  iiven  if  it  were  possible  to 
overcome  the  propensity  of  raters  to  give  high  ratings,  this  would  not 
necessarily  Insui-e  valid  z'atlnf’.o. 

5.  The  Inability  of  iTitors  to  rate  stjpar.ito  qualities  Independently- 
has  been  called  halo  effect.  Accoi'din-'  to  some,  tiie  presence  of  this 
effect  argues  for  additional  trainiiLg  of  the  raters. 

6.  The  execution  of  periodic  ratings  may  require  that  the  rater  be 
able  to  remember  all  the  pertinent  behaviors  which  should  be  considered 
In  passing  Judgment  on  an  employee’s  proficiency.  He  must  then  be  able 
to  evalziate  these  emd  arrive  at  a rating  for  a trait  or  an  area  of  behav- 
ior, Such  remembering  and  evaluating  constitutes  a task  which  the  rater 
may  not  be  capable  of  cariylng  out  effectively. 

7.  The  criticism  has  been  made  that  good  rating  using  conventional 
rating  methods  requires  an  Imprfictlcable  amount  of  rater  training. 


FORCED-CHOICE  RATING  PROCEDURES 
General 


Whether  or  not  the  term  "forced-choice"  is  the  most  appropriate  name 
for  the  rating  procedure  which  is  to  bo  discussed  hero  la  debatable.  At 
any  rate,  it  has  become  almost  a household  phrase  among  applied  -osycholo- 
gists  during  the  yes-rs  since  World  War  II.  A "forced -choice"  rating  form 
I0  one  in  which  the  rater  la  required  to  make  a series  of  choices,  from 
groups  or  blocks  of  descriptive  statements,  of  those  statements  which  are 
most  (and/or  least)  descriptive  of  the  person  being  rated. 

While  this  forced -choice  proced-'n'o  us  now  uooa  was  devci  oped  by  the 
Personnel  Research  Section  of  the  Adjute.nt  General's  Office  in  1^4‘i , It 
cannot  bo  thought  of  us  without,  roots  in  previous  psychological  reaeareV!. 
It  was  not,  for  Inctonce,  tin;  first  motluxi  to  make  use  of  behavlorn] 
statements  rather  than  trait  def;i;-rvitiona . Mor  was  it  the  first  to  uiio 
statements  which  had  boon  sy s tom.-'. t leal ly  cvaiuited  prior  to  inclusion. 
Richardson  and  Kuder  (6?)  doacrlbe  the  development  in  1933  of  a rating: 
form  for  salesmen.  The  procedures  used  and  the  rationale  TUidei’l;/-ln,’ 
then  bear  an  obvious  parerit^il  reueiablauco  to  tlioae  iwirfected  later  in 
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the  ivGO  v/hile  Hijh.'.r'd.i.  .1  -xiu.  Kuaer  were  .a  the  otaff  ■ f the  Persoimol 
Research  Section.  Since  the  war  Eichjirijon  haa  developed  a nimher  of 
f erced-choice  ratiiv:  fonns  for  the  mtln;^’  of  industrial  auper’/’lsors . 
lie  believes  the.t  forced-choice  ratlrp’.i  c.^ae  closer  to  rcoetin^  the  noces- 
saty  requireioont.T  of  a Doiu\i  ixitlr^.;  system  than  do  oonvontlonal  nethods. 
These  req.uire:aents  lUchaiuoon  (bo)  lists  as;  (a)  it  sh  )uld  be  geared  to 
the  needs  of  the  Individual  oi’ganlcation ; (b)  it  r.ust  bo  reliable;  (c) 
the  results  of  a ii-itin^:  .'irast  be  expressed  in  n’xnerlcul  tenns ; (d)  the  re- 
sults niu.et  be  useful  for  aih'iinlsti-.itlon  a.)  woll  a;'  for  counseling  and 
training;  (e)  the  content  :::'u  t.  incliKin  ol'  r.er’.tr  of  tin.  ,50b  which  have 
been  found  to  be  significant;  (f)  the  ratings  iriust  be  as  free  from  bias 
and  prejudice  as  possible;  (g)  3o;no  moans  must  be  built  into  the  device 
to  counteract  the -tendency  to  rate  too  high;  (h)  the  form  must  be  easy 
to  fill  out;  (i)  the  cxithod  should  involve  a check  on  the  care  with 
which  the  form  is  filled  out;  and  (j)  the  system  must  be  practical  in 
the  sense  that  r.uiults  can  be.  obtu, ined,  ’•■■cordofl , evaluated,  and  sum- 
marized economically . 

It  should  be  pointed  out  tbat  the  forcod-choico  method  does  not  meet 
completely  all  the  roq.ulreia8nt3  which  Richardson  has  specified.  In  and 
of  it. elf  its  usefulness  In  the  counseling  and  training  of  the  rater's 
ouboi’dlnates  has  not  boon  demons tixatod.  However,  an  additional  sheet  on 
which  the  rater  my  Indicate  In  a systematic  fashion  the  Individual’s 
strong  and  weak  points  can  bo  appended  to  the  scale.  A copy  of  the  in- 
formation recorded  on  this  extra  sheet  may  be  retained  and  used  by  the 
rater  in  the  counseling  aM  training  of  his  workers.  Space  can  also  be 
provided  on  this  sheet  for  cornments  in  the  rater's  own  words;  this  gives 
the  rater  a chfinco  to  vent  any  feelings  which  he  may  have  and  which  he 
thinks  my  not  bo  adequately  oiprossed  In  the  rating  form  proper.  The 
use  of  such  a procedure  In  connection  with  a forced -choice  Instnictor 
evaluation  report  has  been  reported  by  Seeley  (76).  As  to  Richardson's 
statement  that  the  rating  method  oho^U.d  Involve,  if  possible,  ways  of 
chocking  on  the  care  and  skill  with  which  the  form  has  been  filled  out, 
there  has  been  no  published  Informtlon  on  how  this  c£ui  be  accoii5>ll8hed. 

A word  of  caution  seems  necessary  here.  While  the  study  reported  In 
the  following  section  of  this  bulletin  Kss  as  one  of  Its  alms  a con^w-rl- 
aon  between  conventlon/il  and  forcod-cho'  rating  nothods,  It  la  diffi- 
cult to  make  a fair  ccmivarlson  from  the  available  research  literature. 

The  materials  on  the  deficiencies  of  conventional  rating  methods  presented 
above  grew  out  of  some  30  years  of  e.Tperlenco  with  them.  Experience  with 
forced-choice  rating  methods  la  very  limited.  The  few  published  research 
reports  on  forced -choice  lifting  tend  to  emphasize  those  aspects  of  It 
that  correct  the  doflcloncloa  of  older  methods.  A review  of  thse  reports 
thus  ylolds  an  unavoidable  emphasis  on  the  ouporlorltios  of  the  forced- 
choice  method.  It  could  well  be  that,  as  more  is  leamod  about  the  re- 
sults of  forced -choice  ratings  under  operational  rather  than  experimental 
conditions,  some  of  those  apparent  ouporlorltios  will  vanish.  It  Is  also 
quite  conceivable  that,  as  experience  with  forced -choice  ratings  accumu- 
lates, peycliologlats  m.ay  flid  that  the  method  has  Its  own  unique  defici- 
encies. 
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iiatlonale  of  Forced -Choice 


According  to  lUch/ii’daou,  a Uiolc  aeaa'iipL of  the  forced-choice 
method  Is  that  merit  rjitln,-;  lo  host  broken  down  Into  two  distinct  phases; 
(a)  reporting  the  Job  perfoi’uvinco  of  a man,  anu  (b)  ovalmting  the  I’ocord 
or  estimate  of  Job  perfonriiuico . Ho  contends  that  convent lon/il  rating 
procedures  force  tlie  inter  into  tlio  nocoujity  of  mixing  valuiitlon  aixi 
reporting,  to  the  detriment  of  tlio  latter. 


Another  In^jortant  featui’e  of  the  rationale  of  forced-choice  rating 
is  that  haring  to  choose  between  two  or  more  statements  forces  a critical 
Judgment  which  Is  not  usmllj'  called  for  by  conventional  rating  procedures 
Richardson  belieros  that  the  inter  must  ignore  to  somo  extent  hla  general 
ln5)reaslon  of  a man  and  think  back  to  epoclfic  instances  of  his  work  be- 
havior. 


In  constructiiig  a forood-choice  mtlng  fon-j,  statements  are  usually 
obtained  from  com^ionts  tliat  supervisors  utiko  about  vorkors . It  is  con- 
sidered to  bo  good  practice  to  leave  the  statements  in  their  origlna],  forti 
as  nearly  as  possible.  A laiaimum  of  oditini^  is  bolioved  to  make  for  ease 
of  understanding  on  the  part  of  the  raters  who  us.e  the  scale. 


The  analyses  which  havo  been  mvde  in  tho  building  of  forced-choice 
rating  forms  have  Indicated  tliat  coiamenta  which  supervisors  mike  about 
their  workers  are  not  of  eq,ual  significance.  Of  all  the  favorable  and  un- 
favorable comments  whlcii  arc  made  in  written  reports,  orlLy  a few  are 
highly  discriminating  between  cfrectlve  and  ineffective  workers.  The 
forced -choice  meth'.xi  attempts  to  pair  dlscrimlnatlrig  and  non-discriminating 
statements  so  that  the  rater  must  decide  which  is  more  descriptive  of  the 
individual  under  cons idont ion. 


The  matching  of  st^itemonts  within  blocks  ficcoi'dint-;  to  favorable  ap 
pearance  decreases  the  possibility  of  over-rating.  The  usual  prossuros 
toward  lenient  ratii^/  cannot  operate  whon  tho  ratei’  is  luiablo  to  iden- 
tify which  statements  contribute  to  higli  scores. 


Valldlt, 


One  of  the  principal  Juutif icutiona  made  for  forced -choice  rating 
is  that  it  results  in  bettei'  va]  Iditios  tli.an  can  ho  obUilnod  with  con- 
ventional mtlng  procedures  (tv,  t'O,  104).  It  must  be  noted  that  these 
validities,  as  in  the  case  of  the  few  reported  for  conventional  i*ating 
procedures,  are  not  correlations  of  foi’ced-choicu  scores  with  some  msa- 
sui’e  of  perfonaance  on  the  Job,  but  rathoi'  arc  comparisons  with  ranks 
or  scores  that  have  been  assigned  to  workers  by  tho  came  supoi'vioors , 
using  different  methtxis. 


Wherry  (lOk)  reports  a study  conducted  by  tho  Porsoiuiol  Research 
Section  of  the  Adjutant  Goncnl's  Office  in  which  five  different  officer 


efficiency  reporting:  methods  wore  investigated.  One  of  the  five  methods 
omphaaizod  the  forced-choice  typo  of  Itorn,  Tills  method  was  found  to 
have  consistently  higher  validities  tlian  the  four  which  were  more  con- 
ventional in  mituro. 

Richardson  (66)  reports  validities  for  forced-choice  rating  forms 
ranging  from  .67  to  .'fh.  It  should  bo  pointed  out  that  Richardson  made 
use  of  a purified  criterion.  That  la,  persons  on  whom  a reliable  rating 
exteriml  to  the  forceu-choice  ratirv-;  could  not  bo  obt/iiuod  wore  removed 
from  the  criterion  group.  Increasing  the  reliability  of  the  criterion 
in  this  manner  undoubtedly  has  the  effect  of  increasing  the  congjuted 
validities  of  the  forced-choice  rating  fornns  somewhat,  but  the  validities 
would  probably  still  bo  very  substantial  even  if  the  criterion  had  not  , 

been  purified.  I 

Reliability --I^tpr  Agreement 

Rlchfirdson  (65)  reports  reliabilities  ranging  fron  .69  to  .97  de-  ^ 

ponding  on  which  kind  of  reliability  coefficient  was  ccsniputed.  Two  odd- 
oven  roliabilltloB , one  of  .93  a-nd  one  of  .96,  are  reported.  A relia- 
bility of  .97  vaB  obtained  from  rora tings  of  the  same  workers  by  the 
same  raters  using  the  same  forced-choice  reporting  fom.  When  the 
raters  did  tlie  reratlng  on  different  forced -choice  forma,  reliabilities 
of  .93  and  .97  wore  obtained,  A relation^jhip  of  .69  was  found  between  1 

the  forced-choice  ratings  of  one  rater  and  those  of  another  rater  when 
different  forced-choice  forms  were  used  by  each  rater. 

Seeley  (77)  reports  a corrected  edd-oven  reliability  of  .77  for  a 1 

forced-choice  rating  fonu  developed  for  use  in  evaluating  Navy  inatruc-  I 

tors . 

j 

Error  of  Leniency 

Althougli  one  of  the  pi'inclpfil  merits  of  forced -choice  rating  meth- 
'Jds  Is  supposed  to  bo  roslsbance  to  doll’  'ate  effort  to  give  spuriously 
hi,^  ratings,  little  evidence  has  been  pr^,aentod  on  this  point.  The 
data  reported  by  Rlchnaxison  (65)  and  by  Seeley  (77)  wore  obtained  under 
experimental  conditions  in  which  rater  tpr\denclo3  towani  leniency  would 
cortalnly  not  bo  iivuclmim:.  Sisson  (Bo),  liowevor,  gives  a graphical  con- 
pariaon  of  data  from  a conventional  mtl.n,-’  (Form  67)  fuul  frcri  a forced- 
choice  rfvting  (Form  07-I)  of  Anr^y  officers,  when  these  were  used  to  ob- 
tain actual  itithor  t,han  oxporlinontal  ratings.  This  graphical  cctaparleon 
is  presented  in  Figure  1.  While  the  forced-choice  distribution  in 
Figure  1 is  somewluit  loss  skewed  (because  of  more  cases  at  low  score 
levels)  than  that  resulting  from  \iae  of  the  conventional  fom,  the  dif- 
ference is  not  ingress ivo. 
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of  Ratings 


CCWVErn’IONAL 


FORCED-CHOICE 


FIG.  ] . --CompaT^atlvQ  distributions  of  Aimy  officer 
Bcoros,  with  range  of  scores  eqiitited,  on  two  types  of 
rating  fonuo.  (After  .Uoson.  ) 


rruio  Effoct, 


■ Jlnco  forooa-ohoi.V'  fonrj,  q,  net  •tt-  ;t  * vo  (.vilunto  dci-U'oe 

of  non  uo  Li  Li  lot;  of  viri  "iLi  tnltn,  nn-  q-f  of  i'  ff  t-itinco  In  variout; 

tireaj,  tho  concept  of  "halo  effect"  -.-.zvj  not  at.pi;.  . If,  however,  the 
various  atntomontLt  on  n for.ti  are  not  no  docertiv--  their  potential 
effect  on  tot;il  oc.ore  tho  fidhorent  '.  .-.f  foi'ced-.-h.  ’fa'  believe,  then 
it  would  be  poscible  for  "halo"  to  ixifln  .nco  tho  t t'’!  score. 


ether  Aspects  of  Forced-Choice  Kathnn  Frocodurea 

Lleloctlon  of  content.  In  aontraat  to  t}ie  ut-ninl  content  of  cenven- 
tioixal  ratlnji  forme  (.ether  than  those  dcvolot  ed  by  use  of  tne  "critical 
incident"  method),  the  statement.)  Inclixicd  in  forced- -hoice  fcitns  are 
ones  tiuit  ha\’'c  been  nadi;  by  supoiw'isora  about  tho  ber.uvior  or  character- 
istics of  workers.  These  .'itatements  ar-  then  submiitted  to  other  sucer- 
visors  for  evalivtlon  tirior  to  Inclusion  in  the  fnrm.  The  only  scorablo 
statomentc  that  would  bo  included  jn  nso  f rc'  woi'.lci  be  those  that  super- 
visors a,^rcru  as  dlc.'-rim'.r.atlnij  between  .qc^x:  '".nd  po<  r workers.  This 
should  result  in  coricider-i.bly  more  unifcL’r..  intci’pretatlons  by  raters  of 
tho  mfVininfi  of  ni  item  than  !c,  possible  when  a’setra-t  traits  are  beln,'.,: 
rtited . 

Rater  i ; ' ' lone  ■ . "’hi, re  have  bc'i-c,  un  if f 1 • l-.l  reports  tliat  the  use 
of  fox’ced- -hoice  m-  thoc,;  'n  the  ratin,-  f Ax-:.iy'  offieex-.  has  xiot  with  con- 
siderable reclstanco  on  tlx  part  -'T  t’no  i-itii’s.  If  such  resistance  tunis 
.u)t,  with  fiu'tiior  xpex-luici,  to  be  ooimnotily  associated  vxith  the  use  of 
forced-choice  ic  T 'xl  • , *hc  practl'al  importance  of  the  .method  is  consider- 
ably diminished.  V-biil  t'r  reason:-  f-;-  the  rop^.rtod  resistance  have  ap- 
'■•.runtly'  nut  been  Invcait  1,-ated , the  follow  in.-:  spi.  -ulatlons  can  be  offered, 

(a)  Raters  may  resent  tho  fact  that  they  do  not  iaiow  how  high  or  low 
they  have  rated  a man. 

(b)  Rxitoro  may  re-jent  ruiy  rating  svston;  tiuvt  ro.aily  rates.  When  the 
rater  knows  that  wlifit  he  puts  on  the  rat  form  will  affect  the  career  of 
the  person  being  loited,  ho  commonly  tr'es  to  avoid  i-atlngs  that  would  have 
an  ill  effect.  The  "error  of  leniency"  in  conventional  ratings  is  the 
rater's  way  out  of  an  unpleasant  responsibility.  Forc.xd-cholce  fonna,  by 
preventing  ouch  dollbomto  over-ratixxg,  my  ci'eate  resentment  against  the 
form. 


(c)  Tho  AUO  forced-choice  form  contained  both  favorable  and  unfavor- 
able otatoments.  Raters  may  resent  having  to  make  choices  between  deroga- 
tory etatemonts  as  descriptive  of  other  officers. 

There  is  iiooio  evidence  to  Indicate  that  rogaixiieao  of  h-ov  tho  raters 
feel  about  tho  ratings  this  feelln^^  does  not  invalidate  the  rating  form. 
Rundqulot,  Winer,  -uid  Kalk  (74)  state  that  forced-choice  Items  maintained 
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♦heir  rel/itlv.t  valldltlea  under  oper?itloiv;l  conditions.  The  total  score 
of  he  forced -choice  Aimy  Efficiency  Report  hoc  also  teen  shown  to  main- 
tain its  validity  when  used  as  the  official  Aray  report. 


Limited  uflafuljoees  of  results.  It  Is  perhaps  a legitimate  crlticlam 
of  the  forced-choice  method  that  the  completed  fonri  la  of  little  use  to 
the  supervisor  In  coxuiaellng  his  workers  as  to  their  strong  and  weak 
points.  Ihls  flisvy  be  an  inevitable  characteristic  of  arQr  rating  method 
that  has  aa  Its  purpose  the  provision  uf  accurate  information  re- 

garding the  rtlatlvG  abilities  of  men.  Rundqulst  and  Bittner  (72)  have 
pointed  out;  "Ratings  which  are  to  serve  as  a basis  for  administrative 
action  must  yield  a valid  measure  of  an  individual's  performance  relative 
to  that  of  othor  ind.lvlduala.  The  rating  system  must  be  specially  de- 
algnod  for  thl:;  purpose;  ''n  such  a system  the  rs-ting  form  or  scale  takes 
cn  particular  eigiii  flea  ace  ....  However,  such  pi'ocedures  will  probably 
be  foimd  to  be  of  little  use  in  asoij^ing  work,  in  j-aislng  morale,  or  In 
helping  people  to  Improve."  As  mentioned  earl J or  (page  11 ),  this  inade- 
tiuacy  of  forced-choice  i-atin,-^  procedures  c'Ui  bo  corj-ected. 

rnunnvi  ly 

] . The  forced-choice  system  of  r;;er.lt  rat.lnc  is  one  In  which  the 
rater  is  asked  uo  docido  whl-'.h  of  two  or  more  statements  Is  moat  descrip- 
tive of  the  iriti Ivld’Uil  bcL^g  ritod.  'ITie  statoments  from  which  the  rater 
makes  his  choices  are  so  arranged  tlui.t  pairs  of  them  have  the  appearance 
of  being  equally  favorable  things  to  say  about  aa  Individual,  die  of 
each  pair  has  been  foiuju  to  dlacriLaimtc  between  effective  and  ineffec- 
tive indivldu'ilu  ard  the  other  abateuieint  of  the  pair  lias  been  found  to 
be  non-dioci’iniLlnatintl*  ThP  blocks  of  descriptive  atatemonts  which  make 
up  a forced -choice  I'atin,-  form  may  Include  either  one  or  two  such  pairs 
of  statements.  If  there  i-.r,-  pairs  in  a bl.ock,  one  pair  may  possess 
a (i  Ifferent  degree  of  favofviplonogc  fn^mi  the  other,  Tn  this  case  the 
rviter  is  asked  to  indicate  which  of  the  four  otutenirnts  is  most  descrip- 
tive and  which  is  leant  dc-acrl  ptlvc  of  fclic  indl  vidiv'.l . 

P.  The  forc(,d-ch>  .1  pi' .wodu ro  h.'..;  been  sain,  to  separate  the  re- 
porting of  an  iiidlvldit'J  ' behavior  fj' im  the  ovaluatlcn  of  that  behavior. 
Tnc'  rater  is  not  -d  t’  c'ly  h.o',,  ."’.u  ;!i  -r  .■■ortala  t -'ilt  or  behavior  an 
■'ndividuc.l  roc:  '-.T)  n t-  'vht.th-'tr  it  1 ■ ’.oi-vi  o’’  baa  to  be  like  that.  He 

-nly  to  'nalr  • ..'•Ich  of  several  fit cmeiitR  Is  more  typical.  Making 

-hi.  decls'  -•ay  d';'  -.  ri:  -rlt i'v- 1 'udyient  thar.  's  needed  for  the 

CO!:.:  letl- -a  eif  ■-  , -i.v-  ' ;vi.l  f",  ■:  sln  -e  t:ie  I'l  lor  presumably  must 

think  back  - gja.l  g-.'  -l  ientj  befo;  - -’I'l'y;  hlj  doriulon.  llie  Ifinguage 

used  in  a forced-  h ' Ice  r-itln,g  form  v:  11  ; rob.'ibly  be  more  understandable 
to  '-ho  rater  ..  ''•■.•'je  stutemfi.nt-s  will  li.'.ve  bean  drawn  with  a minimum 

cdltliv;  f;-  '■  Mont  ' of  .lUpcrv  : s, m-j  tliemselvos. 

3.  Ir.  ■'.  '■'‘.a:  " '.  in cages  "b'’:  for  forced-choice  procedure 

conven*  ' 'ul  tho:  o f . '.d-unce  to  deliberate  efforts 


l!' 


1 ■ 1:1  .r.oi't)  ui.i  ^-'.pfiv'or  valid  I L;,- . : la  1 ' ab I i t,y  Is  as  or 

L!i.in  t.b.  :•>  .1  i.ib;  ! 1 l.'i'.-  btaiund  w Mh  ■•oiiv  atioml  rating  mothoda. 

1| . Tho  !•:  V.  1.’  1'  ni<.'  ;al,ara  t.  tiiM  ox.  . it  Lon  -d’  f or, jod. -choice 

,'a:iy  bi'  c pr’ i pa!  riol.'f  in  itotor  '•I'cy  I. position  of 
jiR’,!.  •luuuai  wlii.-ii  lliD  I’or-  c-.-hoi.a?  in'  'cui.  m ■ • 'lii'  ,ju.  If  bhin'o  is  cx- 

C'Uii'Lv.  7'€!:,oiit,T.aiit  o:i  Ui'-  pai'L  of  the  lat.i  r.  , t..  i iniiy  Lend  to  discredit 

the  i;t;Lhoii.  I'ho  sngj-aiL'  o.  Ir.  ; been  i.vuie  tn.-vt  th.  an;''> r-visors  be  offered 
opportiuiiLy  L<'>  exprXMe  Die  i r ^)wn  v Itxwa  I i '.dalLlon  to  completing  the 
for.'ed-cholco  I'orm. 


'nil''  DKVhl.OC.M',  .;i>  OF  FORCKD-iniOICii:  FOPJvt;  yo'.l 
RAfIN’O  ■.  lU  FOh'CF  I'FCHKl'JAL  PCHOa  INoTHUCToRl 

I'lie  Hatiiy;  Problem 

Quality  ••f  ui. L riu' 'ii  la  .'niclal  J.n  the  uiu'oesaful  accomplishment 
if  the  Biiaclon  of  tlm  .\  ! r ‘iralning  Conuna.iu.1 . Li  ot-aei'  to  maintain  and 
Impr'ovc)  qij.allty  of  inatniotlon , the  Air  I’raiulng  ,:ornra'i.nd  is  interested 
In  fiLlng  iiu  Uuxi.i  ttuiL  'w I 1 ! Identify,  fo;-  rotontl.'n  and  promotion,  thooe 
Inatrnctors  of  anpurior  offectl vonoss , and,  for  transfer  to  other  duties, 
those  of  least  offectivoaess . Conventioivil.  ratin;.’  meth  xis  have  proved 
iruidoq'Wite  for  this  purpose. 

Human  Hosources  Heioarch  Center  was  tliereforo  asked  to  attempt  to 
d ivolop  rating  proceduros  that  would  b.,  ttor  acc.mipl Is'n  the  desired  ob- 
.pjcl.'vou.  Baaed  on  the  r',.p.vi.ed  orper Ituicoa  of  ,)tlier  investigators,  the 
heat  hope  of  develop ! 11^.'  an  eJoqnato  ritii;’  procedure  appeared  to  lie  in 
the  use  of  foi’C'MU  -etioli'o  ■ tiioda. 


J'ho  p.rov  le-u.'ly  lopou  .d  research  on  f’oi'C(.;0-.'hoIco , however,  left  a 
r -'f  .ui  U'|^l^■lo  laa'  .pieat!  jiui  nivi.ns'worod . Ijv  onior  that  the  rating 


i,anfi.i  and  jn'o-ebur.,.  ie."  1 1 
:ble,  thoac  qae.id.'  m n’  ".!- 
au.'h  .-.xethodol  Is;  naai 

iiaychol  oglst;-  (■■■n,''  fn.  ■ w t. 


1 'I’or  th.j  'I':"'. 'nln,.;  Ci'iiiiivuui  be  the  best  pos- 
s.,.!W(ir In,.' . Is  s.ldltlon,  It  was  felt  that 
'u  mi  ght  be  of  .'.  ns  li oi’^iblo  value  to  other 
■itln,-;  pro!  is. 


'I'hs  Melh'dologlc.il  !’rjMem!' 


King  of  fo"coa -ih-jice  block.  The  f n'ced- iholce  blocks  imported  by 
ili  :aon  (yo),  Rlr'h'rta.ui  lH'p'',  aid  See''f!y  (7o)  differ  in  construction. 
Fif-’ure  < ohow!)  a Lyplcal  block  as  reported  by  Qinaon. 

Two  eUxteiituitr;  have  a fuvomble  'uid  two  an  uufavonible  appeanvneo , 
the  rater  boliv-:  Instructed  t ■ chock  the  moot  descriptive  statement  and 
the  least  do  icrlptlve. 


?iyjm  2 


Forcod -Choice  Block- -Slason 
A.  Falla  to  aupport  lollov  offlcere 
3.  CK'erstepa  hla  authority 

C.  Civ'os  clear  and  concise  directions  ^ 

D.  Verj"  exacting  In  all  details 


II  II 

II  II 

II  I II 

II  fl 


FIGURE  3 

Forced -Choice  Block- -Richardson 


M06T  LEAST 

A A 

B 3 

C C 

L D 

E E 


Does  not  play  office  politics 
Is  a never-tiring  worker 
Never  reverses  a decision 

Has  difficulty  formulating  ideas  into  words 
Often  absent  or  tardy 


Richardson's  form,  Illustrated  in  Figure  3,  adds  a fifth  statement 
that  is  neutral  in  appearance  and  non-dlscriiclnatlng. 


FIGURE  4 

Forced-Choice  Block--Seoley 
a Able  to  maintain  dlnclplino 

b Has  a good  nvemory 

c High  degree  of  efficiency 

d HUi  auggostlonn  and  uivlco  are  extremely  valuable 


The  form  developed  by  Seeley  (see  Figure  4)  contains  only  favorable 
atateaente,  with  Instructions  to  the  raters  to  check  the  two  statements 
that  are  moet  descriptive  of  the  person  holng  rated. 
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Since  each  of  theee  throe  forme  waa  developed  under  different  con- 
ditions, It  la  not  posslbio,  from  the  published  data,  to  make  valid  com- 
purloons  amon^  them.  Noi’  la  It  known  that  other  ways  of  ocmblnlng  state - 
meats  Into  blocks  lal^/fit  n-^t  bo  superior  to  those  above. 

This  study  vas  doal^uiod  In  s\ich  mr-nnor  un  to  provide  comparisons 
‘iiiiong  the  above  fonaa.  it\  Mdltlon,  tlireo  ;iew  fonns  were  developed  and 
Incl’jded  In  the  eiporlment . 

ITie  preference  Index.  Sisson  (80)  refers  to  the  preference  Index 
or  value  of  a statement  variously  as  (a)  "the  extent  to  which  people  In 
general  tend  to  use  [it]  In  describing  other  people,"  (b)  "general  favor- 
ablonesa,"  and  (c)  as  "the  terxloncy  of  ratoi^  to  mark  people  high  or  low 
on  the  particular  behavior  Item."  The  formula  used  for  obtaining  the  In- 
dex cotl  be  related  only  to  this  last  definition,  l.e.,  x 100,  where  w 
la  the  weight  on  the  5-polnt  3ca.lo  In  terras  of  which  each  Item  was  rated. 

HlchaiUson  {6^)  lists  as  separate  characteristics  of  a statement 
(a)  "generally  Judged  favombleness  or  unfavorableness  of  the  stated  be- 
havior" and  (b)  "popularity,  ( preference -value , or  more  explicitly  use- 
frequency)  of  the  element."  He  does  i.ot  specify  the  operations  for  ob- 
taining indices  of  either. 

Seeley's  (76)  definition  Is  "preference  index,  l.e.,  an  average 
rating  or  meiasuro  of  popularity  for  each  phrase  as  used  to  describe  In- 
structors." Ho  cixnputed  this  index  for  Individual  Items  by  taking  half 
tho  sum  of  the  moan  scores  of  "best"  and  "poorest"  instructors.  These 
mofui  scores  wore  obtained  from  ratings,  on  a 5-polnt  scale,  of  the  de- 
gree to  which  the  items  described  the  Instinictors . This  is  essentially 
tho  Bsmie  procedure  used  by  Sisson. 

Seaantlcallj' , It  would  appear  that  "extent  to  which  people  tend  to 
u"  • a statement"  is  a different  thing  from  "apparent  favorableness"  and 
that  neither  of  those  would  be  measured  by  the  opomtlona  described. 
Therefore,  In  tho  present  experiment,  an  attempt  was  iraide  to  get  ratings 
t ; tho  "f'^vorab’’ moss"  of  each  statement  and  to  ccxnparo  these  with  the 
"preference  Index." 


Outline  of  Project  Plan 

Tho  following  stops  constituted  the  general  plan  of  the  project: 

(1)  Collection  of  a large  number  of  statements  describing  tho  per- 
formance of  Instructors. 

(2)  Collection  of  supervisor  performance  rankings  of  Instructors 
for  a sample  of  the  Air  k’orce  technical  school  Instructor  population. 
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(3)  Obtainin<4  ratint^a  of  applicability  of  the  descriptive  state - 
nenta  to  certain  Instructor  peroonnol  selected  on  the  basis  of  the  super- 
visor ranirlnga . 


(k)  Obteljiing  rating's  of  favorableness  for  the  descriptive  state- 
ments . 

(5)  Computation  of  preference  iivilces,  favorableness  indices,  and 
discrimination  Indicoo  on  the  basis  of  the  data  obtained  in  steps  (2), 

(3),  and  (li). 

(6)  Construction  of  a number  of  dlfferont  kinds  of  forced-choice  ■ 

rating  forms,  I 

(7)  Trying  out  tho  experimental  rating  fomr.!  on  one  portion  of  the 
popiUatlon  of  Air  Force  technical  school  instructors,  developing  scoring 
keys,  arid  cross -validating  on  another  portion  of  the  same  population. 

' I 

/ 

Fjcec u W on  of  tho  Pro.l ect  Plan 

1.  Collection  of  statements  describing  instructor  performance.  The 
basic  material  fi'om  which  foi’cod -choice  rating  forms  are  constructed  is 
composed  of  a large  numbtir  of  statements  which  relate  to  the  performance 
of  tho  particular  Job  for  which  tho  rating  form  is  being  designed.  There- 
fore, tho  firat  concern  of  this  project  was  tho  collection  of  statements 
that  wore,  or  night  bo,  doscrlptlvo  of  the  perfoimnnce  of  Air  Force  tech- 
nical school  InstnicJors. 

One  of  the  prino’pal  sources  of  such  statemontc  was  written  remarks 
of  instructor  supervi  . 0’*'^  about  the  performance  of  various  instructors. 

These  had  boen  written  into  i.'uo  apace  provided  for  "corm.ients'’  on  a pre- 
vious instructor  rating  form.  In  addition  to  this  source,  statements 
were  taken  from  other  instructor  rating  forms  and  from  rating  forma  used 
for  rating  persoruiol  other  than  Insti-uctors . 

A survey  of  the  I’osulting  list  revealed  that  there  were  a great  maiiy 
more  statements  with  favorable  th«in  with  unfavorable  tone.  To  eq.uallze 
this  difference,  some  of  the  favorable  3t.atemonto  wore  reworded  so  as  to 
bo  unfavorable.  For  eia.mple,  if  a state.ment  indicated  rhat  the  instruc- 
tor had  behaved  in  some  desirable  way,  it  coiild  oasll^v  be  reversed  b.y  ea^’- 
Ing,  that  tho  Instractor  did  not  behave  in  tills  way.  The  list,  with  those 
cliangos,  totaled  stiiteinents , 

2.  Collection  of  jiorf onmuico  rank i ly.’.o . In  onier  to  establish  a 

* criterion  for  use  in  evaluating  the  statoments  which  had  been  collected, 

tho  Immediate  supervisors  of  the  technical  training  Instructors  at 
Chanute  Air  Force  Base,  Illinois,  were  asked  to  rank  their  instructors 
as  bo  over-all  performance.  Only  those  supervisor.?  were  included  who 
had  under  their  supor'/'ialon  at  least  b and  not  more  then  20  Instructors. 

. . 
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'ITie  oupervltJora  ware  naked  to  pick  fi’au  a list  of  their  Instructors 
the  inilvidijal  they  connldorod  to  he  the  moat  competent  of  the  group  and 
to  indicate  his  rcuik  as  "1."  dhey  were  then  asked  to  give  the  lowest 
rank  to  the  individual  they  considered  to  be  the  least  competent  of  the 
group.  Following  this,  they  wore  instructed  to  identify  the  second  moat 
competent  instructor,  the  second  least  competent  instructor,  etc.,  until 
Kinks  hod  been  assigned  to  all  the  Instiuctors  in  their  group  whom  they 
had  known  for  a period  of  at  least  two  months.  Assurance  was  given  that 
the  iunkln/^s  wore  to  bo  used  solely  for  experimental  purposes. 

In  order  to  provide  some  check  upon  the  reliahillty  of  these  super- 
visor rankings,  it  was  decided  to  collect  similar  rankings  from  the  In- 
structors themselves.  That  is,  from  the  list  of  all  Instructors  in  his 
group  each  instructor  was  asked  to  cross  out  his  own  name  and  the  names 
of  any  InstTuctors  whom  he  had  known  less  then  two  months.  The  direc- 
tions for  ranking  the  romalnlng  names  wert)  the  same  as  those  given  to 
the  supervisors.  These  rankings  wore  converted  to  standard  scores  and 
those  wore  averaged  for  each  man  rated. 

Tlio  ranks  assigned  Independently  by  Instructors  ard.  by  enporvlBorB 
were  found  to  correlate  .807  (N  = 635  instructors),  and  thus  these  pro- 
vide fairly  reliable  criterion  data  for  the  study.  This  suggests  the 
question  "Why  not  Just  uso  supervisor  rankings  instead  of  going  to  the 
trouble  of  developing  a latlng,  scale?"  Tuo  answer,  of  course,  is  that 
for  email  groups  of  inntnictors  such  rankings  may  misrepresent  the  rela- 
tive abilities  of  instructors  in  different  groups.  It  is  conceivable 
thiit  all  instructors  in  one  group  might  bo  superior  to  all  those  in 
another.  Rankings  would  conceal  this,  while  scores  frem  a good  rating 
scale  should  reflect  it, 

3.  Obtaining-;  ratings  of  applicability  of  the  statements.  On  the 
basis  of  the  rankings  assigned  hy  both  the  instructor  supervisors  and 
the  instructors  themselves , two  extreme  groups  of  Instnictors  were 
picked.  One  group  con.)isted  of  those  instructors  who  were  given  a top 
ranking  by  their  supenrlaor  and  who  wore  also  rated  ab'sre  average  by 
their  fellow  Instructors.  Tho  other  gro>-  consisted  of  those  instruc- 
tors who  wore  given  a bottom  ranking  by  i-ne  Bupeiviaor  and  who  were 
also  rated  below  average  by  their  fellow  Instiuictora . The  9^9  descrip- 
tive otatements  wore  divided  among  four  forma.  Fifty-four  supervisors 
wore  asked  to  use  ono  of  these  forms  in  describing  each  of  two  specific 
instructors . The  instructors  to  bo  described  were  those  who  had  been 
Idontifiod  both  by  the  oupoiurloor  and  by  fellow  Instructors  as  most  and 
least  effective  In  their  group--hut  the  supervisor  was  not  told  thl8--he 
was  merely  asked  to  Indicate  on  a 5-point  scale  (from  0 to  4)  the  degree 
of  applicability  of  each  statement  to  those  piirticular  Instiructors . 

This  gave  54  ratings  of  the  applicability  of  each  statement,  half  on 
the  more  and  half  on  tho  lees  offectlvo  lnf»tructorn . 

4.  Obtaining  ratliigs  of  favoi*abloneea  on  tlie  statements.  Forty -six 
Instructor  suporvlsors  not  used  In  step  3 were  asked  to  make  ratings  of 
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tho  ravornbloneiss  of  oach  of  the  9^9  otvtoraonta  hy  Indicating  on  a 5- 
point  scale  (0-l»,  0 denoting  very  unfavorable)  how  favorable  each  etate- 
■vuit  was  when  used  with  reference  to  an  instructor. 

5 . Computin^v  preference,  favorabloness , and  dlucrlmlmtion  Indices. 
Preference  Indices  were  computed  for  each  of  the  state^jiante  by  obtaining 
a measure  of  tho  mean  doscrlptlveness  or  applicability  of  each  statement. 
That  is,  tho  i-atlngs  of  the  extent  to  which  a statement  applied  to  the 
effective  Instructors  (see  paragraph  3 above)  were  combined  with  tho 
ratings  of  tho  extent  to  which  it  applied  to  ineffective  instructors. 

Tho  mean  of  these  mtlni-js  is  an  indication  of  the  extent  to  which  the 
entire  group  was  ranked  high  or  low  oil  a p«irtlcular  statement.  The  range 
of  these  pre-^  rence  indices  was  from  .20  (low)  to  3.39  (high). 

Tho  index  of  favorabloness  was  the  moan  favorableneas  rating  for 
each  statement  (paragraph  4 above).  The  ran<^9  of  these  indices  was  frem 
,20  to  3.99. 

The  index  of  diocrlmlniitlon  was  cesaputod  by  use  of  tho  formula 
(l>n  - , In  which  1)11  lo  the  moan  descriptivenoss'^  of  the  statement 

y _ 2 

for  effective  Insti'actora^  moan  daacriptlvenesa  of  the  state- 

.osnt  for  ineffective  instructors,  p is  tho  proportion  of  tho  total  number 

of  instructors  In  tho  higli  group,  q.  is  the  proportion  of  tho  total  number 

of  instructors  in  the  low  group,  and  y is  the  ordinate  of  the  normal 
curv'a  corresponding  to  tho  values  of  p and  q.  The  range  of  the  resulting 

discrimination  Inilces  was  fcom  -1.46  to  +1.59*  Figures  5 and  6 show  the 

distribution  of  these  discrimination  indices  for  each  level  of  preference 
and  for  each  level  of  ravorabloness . 

6.  Comparison  of  prtUore3K'.e  and  favorabloness  indices.  Because  of 
the  bimcdal  distributions  of  both  proforonce  and  favoro-bleness  indices, 
the  distribution  of' favorabloness  indices  was  alvlded  at  the  midpoint  of 
the  scale  (2.00),  and  tho  up:  Eiid  lover  halves  were  coirelated  with 

tho  preferonce  indl'.'os  of  tho  some  statornonts.  Tlio  resulting  coefflcl- 
ontj  wore  -.03  unii  4.06,  respectively.  Apparontliv , as  noted  below,  these 
are  nob  indices  of  tho  s.^^uno  thing. 

The  blriKAlality  in  tt  .rm.;  of  tho  pf'forcnce  index  ap;)arontly  resulted 
from  the  fact  that  fa'/orable  items  tendod  to  bo  rated  rtslatlvely  high  on 
applicability  t"  gooi  Instinctoro , but  not  nocossai'ily  to  be  rated  ox- 
trenoly  low  on  applicability  to  poor  Instiuictors . Conversely,  items  with 
low  favorabllity  Indices  tended  to  bo  rated  as  having  low  applicability 
to  g'>od  inct  uctoi'.H,  but  not  necessarily  as  havirq-;  high  applicability  to 
poor  instructors.  This  tondoncj  amoii;^  r.^ters  produced  blmodallty  in  the 
al  itrib/Utlon  f proforonoo  itviicos  arvi  inJ  ii  ed  an  appjii'ontly  high  rela- 
tionr'hlp  bot'wi  . : fiii-.i-.  Q an:i  I'.'vv  n bility  'ndl(’:os  (r  = .69)  when  data 
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■V  ui  applicability  ratin.;’  as  defined  In  paragraph  3 
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DISTRIBUTION  CF  DISCRMIiATION  ir®ICES 
AT  VARIOUS  PREFERENCE  LEVELS 
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on  'both  favomblo  and  .;nfavor!iblo  Itoixi  voi'a  ptwlod.  However,  a negli- 
gible relatloniihlp  ia  evident  between  the  two  indl-.-ee  when  data  free 
favorable  and  unfavorable  Items  are  connliored  aop-amteljr. 

The  reaaona  foi’  thin  Lack  of  agreorient  appear  to  lie  in  the  nature 
of  the  two  indices,  llie  preference  index  is  the  mean  degree  of  appli- 
cability of  a statement  to  the  entire  population  (oi'  to  the  hi^  and  low 
extremes  thereof).  Iho  same  moan  degree  f applicability  could  result 
from  statements  that  differed  considerably  in  their  degrees  of  applica- 
bility to  the  high  and  low  groups.  For  instance,  throe  statements  with 
applicability  (or  descriptlveneso ) moan  scores  of  4,  3,  and  2 for  the 
high  group  would  yield  the  same  preforenco  Indox  when  the  mean  scores 
for  the  low  group  were  0,  1,  and  2,  rospoctivoly.  But  the  purpose  of 
computing  such  tin  index  Is  so  that  stattsmuits  that  appear  equally  favor- 
able can  be  paired  In  the  forced-choice  blocks.  And  statements  with 
descrlptlveness  moans  of  4 and  ''  for  the  high  aiid  low  groups,  respectively. 
Inevitably  appear  to  bo  more  -^avorablo  than  statements  for  which  the  de- 
scrlptlveness  is  the  same  for  both  groups.  Tiie  preference  index,  being 
an  average,  obscures  those  differences. 

The  favorabloness  index,  on  the  other  iiand,  was  a direct  attempt  to 
ascertain  how  favorable  a statement  looked  to  the  supervisors  who  were 
ultimately  to  use  the  fore '3d -choice  form.  Since  the  preference  and  fa- 
vorableness  indices  were  dissimilar,  and  since  the  latter  seemed  more 
llkel;/  to  represent  a statement's  appeai'ince  of  favoxmblenesB,  the  fa- 
vorablonees  Index  was  used  for  the  balance  of  the  study, 

7.  Const  ruction  of  thn  forced-choice  rating  forrm.  The  forced- 
cholce  method  calls  for  the  pairing,  within  blocks,  of  statements  that 
are  of  equivalent  favorablenoss , but  that  differ  in  dl jcrlmlnatlon.  As 
can  be  soon  from  Figure  6,  while  manj’  such  statements  were  available, 
the  niinber  that  could  bo  used  depended  n how  large  a difference  In  dis- 
crimination indices  w?is  demanded.  If  this  difference  were  set  too  high, 
then  only  those  statements  at  the  extremes  of  the  distribution  of  dls- 
criralnatlon  Indices  within  each  favomb leno.is  l<n-el  co-.ild  be  used.  If 
it  wore  set  too  low,  the  atntoments  mlglit  fail  t ) discriminate  when  put 
Into  tlio  f orcod-chol r<  bic'cks.  In  this  'dy  a difference  of  .60  was 
chosen  solely  beca-uso  It  was  the  largest  difference  that  would  make 
available  the  number  of  statoments  necessary  for  forrafi  of  optimum  length. 
However,  this  loaves  an  lutei'ost.lng  methodological  question  unanswered; 

What  should  the  dlacrlrairuitloii  difference  between  two  statements  be 
in  order  to  achieve  nuixlmum  ecoiiony  In  the  use  of  the  pool  of  statements, 
maximum  validity  am  reliahllity,  and  mlnlmian  biasabillty  of  the  f trial 
forced-choice  form?  One  mlglit  expect  that  as  large;'  d Isorlmlnntlon  dif- 
ferences wore  used  reliability  would  Increase,  validity  might  increase, 
but  reslst'inco  to  bias  might  decrease.  The  d iscrlmln-'itlon  difference 
that  will  provide  an  optlmiun  relatl  I'uish  I p between  those  desirable  ch.ar- 
actorlstlc.u  of  a forced-choice  fora  needs  Identification. 


Utlllzl:i(£  thlu  dljjrlmliu.tlou  difference  of  .60,  yijc  forced -choice 
fo-Tno  vore  conctimctod  as  aho'vn  belov.  Tne  favomhloaeas  Indices  (FI) 
ani  discri-ainatlon  l.Tdlcos  (Dl)  given  for  each  statement  are  merely  il- 
lustnitlve--they  are  n't  the  correct  inalces  for  the  statements  shown. 


FOPaM  A 

Gevonty-'Chroe  olocks,  two  stetements  per  block.  There  were  roughly 
oq'ttl  nimbera  of  favorabl and  unfavorable  blocks. 

S'.dnarj'  of  Directions;  Pick  the  s'tatement  which  Is  more  descrip- 
tive (favrnblo  blocks),  or  less  descrip- 
tive (-infavorablo  blocks). 

Ga.male  Blocks;  e.  Aim  of  lesson  la  clearly  presented.  (FI  2.78, 

01  .63) 

b.  Ref  ruins  from  .“pending  too  much  time  boasting 
f his  experiences . (FI  2.6l,  DI  .02) 

y 

a.  Miy  "bawl  out"  or  ridic’.ilo  a jtudont  in  the 
prcjenco  of  others.  (FI  .pO,  DI  -.95) 

b.  Doesn't  got  to  know  each  student's  problems. 

(FI  .87,  DI  -.3M 


F ORM  B 

Thirty-four  blocks,  three  utatoRiiuits  per  block.  One  of  the  three 
sUitemonts  had  a dlscrimluat ion  liriex  .60  higher  tlum  the  other  two. 
Tliero  were  roawhly  equal  numbers  of  favoiublc  and  unfavorable  blocks. 


Summiiry  of  Dlrectlonc;  Pick  the  statement  which  is  most  descrip- 
tive and  the  one  which  is  least  descrip-  i 

tlve  in  each  block.  l 

I 

Sample  Bl'Xjk  i;  a.  Does  not  an.'wor  all  quoutlona  to  the  satis- 
faction of  tlie  students.  (FI  1.43,  1^1  -.20) 

b.  Does  not  use  proper  voice  volume. 

(FI  ] .47,  DI  -.80)  ; 

c.  Supporting  do  tills  are  not  relevant. 

(FI  1.40,  DI  -.15) 

a.  CoJidU''to  class  in  orderly  .nivn;ior. 

(FI  2.2P^  DI  i.:o)  ! 

b.  hepaats  quouti  'ii.i  to  the  vliolo  class  before 
arusworlng  them.  (FI  2.29,  DI  .57) 

. . At  '■ -SO  befisr'  cla.aii.  (FI  2.15,  DI  .53) 


FOl^M  C 


Thirty-one  blockj , t’oui'  etitomentu  per  block.  All  statements  had 
hi{5h  favorableness  Indices , 

Srimmry  of  Diroctions:  Pick  the  two  sfc;itoraente  which  are  most 

doscrlptlvo . 


Sample  Block; 


a.  Patient  with  slow  learners.  (FI  2.82,  DI  1.15) 

b.  Lactnres  with  confidence.  (FI  2.75,  DI  -5^) 

c.  Keeps  interest  and  attention  of  class, 

(FI  2.89,  PI  1.59) 

d.  Acquaints  classes  with  objective  for  each 
lesson  in  advance.  (FI  2.85,  Pi  1.19) 


FOllM  D 


This  fom  was  identical  with  Fonn  C oicopt  for  the  directions. 

Summary  of  Directions;  Pick  the  statement  which  is  most  descrip- 
tive and  the  one  which  is  least  descrip- 
tive in  each  block. 

Sample  Block;  Same  as  Form  C. 


FORM  E 


Thirty-two  blocks,  four  statements  pe  ' block.  IVo  had  high  and  two 
had  low  favomblencss  Indicos. 

Summiry  of  Directions:  Pick  the  statement  which  la  most  descriptive 

and  the  one  which  1.)  least  descriptive  in 
each  block. 

Fine  porson/nl  be.iiing.  (FI  3- 01,  PI  l>2l) 
Adapts  hliii'ielf  zvadily  to  now  duties. 

(FI  2.08,  DI  .09) 

lu  not  well  call I fled  to  instruct  in  all  phases 
of  his  subject.  (FI  .65,  DI  -.75) 

Does  not  put  class  at  ea.so.  (FI  .78,  DI  -.13) 

FORM  F 

Thirty-six  blocks,  five  stfiteiiumts  per  block.  Two  had  high  and  two 
had  low  favorablenoos  indices;  tho  fifth  statement  h.'vd  a favorablenoss 
index  midway  between  tho  hlt^h  and  low  pali's,  and  a low  discrimination 
index. 


Sample  Blocks;  a. 

b. 

c . 

d. 
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Summary  of  Uiractlono:  Pick  the  statement  which  Is  moat  descrip- 
tive and  the  one  which  Is  least  descriptive 
In  each  block. 


Sample  BlocK:  a. 

b. 


d. 

e . 


Works  liai-d.  (FI  3.26,  DI  1.39) 

Soinijwhat  ant'it’on  Is  tic  about  what  he  la  Instructed 
to  do.  (FI  l.?2,  DI  -.yo) 

Could  Improve  oleanllneso  of  classroom  area. 

(FI  1.89,  DI  .D9) 

Not  wlUln^i;  to  adapt  to  changing  situations. 

(FI  1.26,  DI  -.30) 

Can  take  criticism.  (FI  3.30,  DI  .78) 


Form  C uses  Seeley's  (76)  method  of  constructing  blocks.  Form  E 
uses  the  AGO  method  ropo’-tod  by  Sisson  (80),  and  Form  F follows  Richardson 
(6k).  Fonnn  A,  fl,  and  D are  rather  obvious  alternative  constructions  de- 
veloped for  this  experiment. 

8.  Experimental  testing  of  the  six  mting  forms.  The  six  rating 
forms  were  axpertmentally  adrainiotered  at  Chanute,  Scott,  Keesler, 

Warren,  Lowry,  and  Sheppard  Air  Force  Bases.  For  purposes  of  cross- 
validation,  the  instructors  at  the  first  three  bases  ivuned  were  used  as 
one  sample  ana  those  from  the  last  three  bases  as  another. 

In  audition  to  the  adjnlnlatratlon  of  the  forma  at  the  bases  indicated, 
supervisors  were  asked  to  rank  their  suboidinatos  as  to  their  over-all 
effectiveness  as  Instructors,  using  the  method  deacrihod  on  page  20. 

These  ranks  were  converted  into  a noriurliced  lunking  score.  This  was-  ne- 
cessary so  that  lunkg  from  groups  of  different  31zq.s  could  be  given  com- 
parfiblo  meaning:.  T'ho  procedure,  however,  laakes  the  assumption  that  the 
mean  level  of  performance  of  ail  groups  la  the  same.  Since  many  of  the 
groups  C'^nsiatad  of  but  9 or  o irmtr'.ictora,  such  an  assumption  la  almost 
certainly'  rmwarr-t'nto-i . To  the  extent  that  the  ass’iraptlon  Is  unwarranted, 
these  norinalloel  ■icore;:  will  give  an  liviccurate  report  of  the  relative 
abilities  of  inatnictoru  frow  different  gi'oupa . Since  these  are  the  cri- 
terion scores  against  which  the  six  ratlue’  fornu’  were  validated,  It  can 
b')  'issumod  i.lvi.t  tlio  validity  co -ff lei jatc.  obtiilnod  art,*  conservative 
eat  Inflates . 

At  .Scott  a.id  Ki  f-j-c  ■ F .roc  Hises , s urv  r/ l.j  or.')  were  also  asked 
to  fill  out  til-  r-t'ng  •r.i,  o.- .onlln,  to  the  folloving  directions; 

"Fill  out  thin  riting  foj*;.  - - If  you  w -r  < jntlng  your  best  friend  and 
wrvntod  to  nv..-.o  coi'tiln  that  ho  obt;i.h  ■ ■ > high  •'  score  as  possible." 

The  data  gatho)' rU  In  U'. ' cvnuior  vll  icfiu’rod  to  hereafter  as  the 
"bias  xporlnicnt. " 

The  numbtii.  •'  ■ »\ues  which  were  obUiIned  by  f-erm  aid  base  Is  presented 
In  Table  1. 


1 ' 

T 
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TABL?  1 


NIMBER  Of  EXPEHir^OTAL  CASES  ACCORDII.'G  TO  H/’.TiriG  EORM  AI®  MSE 


Base 

Form 

No 

A 

B 

C 

D 

S 

F 

Eat( 

Chan u to 

b7 

77 

o3 

72 

73 

"90 

Scott 

46 

46 

43 

49 

46 

hi 

56 

Keosler 

50 

56 

33 

52 

53 

45 

37 

Warren 

42 

38 

43 

44 

39 

42 

26 

Lowry 

49 

51 

50 

45 

43 

24 

Sheppard 

38 

46 

38 

42 

47 

44 

25 

Bias  (Scott) 

65 

66 

57 

85 

65 

67 

Bias  (Koesler) 

25 

28 

31 

28 

28 

28 

Dlacuaalon  and  Results 

Aiial^sla  Procodnroa.  For  purpoaoa  of  thu  data  collected 

frcm  Scott,  Chanute,  and  Keoaler  Air  Force  Bases  wore  omtlned  and  will 
Be  referred  to  as  Group  I.  The  comhlned  data  from  Lowiy,  Warren,  and 
Shepi)ari  Air  Force  Bases  are  desipaated  Group  II. 

For  each  group  the  following  procedures  were  carried  out:  Using 
the  noimalized  ranilng  score  as  a criterion,  the  highest  one -third  and 
the  lowest  one-third  of  the  completed  forced-choice  foi'ms  wore  selected. 
Graphic  item  counts  were  run  for  each  response  position-^  for  the  high 
and  low  groups  for  each  of  the  six  kinds  of  rating  forms.  Those  counts 
wore  ti’ansferred  to  sumuaiy  sheets  and  validity  indices  were  determined 
by  the  use  of  Davis  (15)  tables. 

Five  experimental  scoring  keys  wejre  made  for  each  of  the  six  forced- 
choice  rating  forms: 

Key  1.  This  key  vac,  based  on  the  original  Ijidlceu  of  discrimination 
of  the  statomonto,  In  terms  of  the  Chaimt  -''ir-plc  (see  p.  22).  Weights 
of  +1,  0,  or  -1  were  assigned  depending  on  the  relative  size  and  sign  of 
the  Indices. 

Key  2.  This  key  was  based  on  item  analc/sls  of  Group  I data.  Using 
Davie  Item  validity  Indices,  response  positions  tovlJv-1  'ndlces  above  6 
wore  weighted  +1,  those  below  -6  wore  welg.l.ted  -1,  and  those  between  -6 
and  +6  were  given  zero  wel^^t.  No  blocks  wore  scored  uilese  one  or  more 


"The  term  "response  position"  refers  to  the  passible  responses  to 
the  statomonto  in  the  forced -choice  form.  For  ox/miple,  If  a forced- 
choice  block  contains  three  statements  and  the  instructions  are  to  check 
the  moat  and  least  descriptive  stotoments , there  would  be  two  possible 
response  positions  for  each  statement  (nust  and  least  deacriptlvo)  and 
six  possible  responfie  positions  in  the  block. 
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ntateraer^ta  had  indiC’.-a  ahovo  +12  arid  one  or  acre  at'.tomeuts  had  Indices 
holcv  -12.  Pio  nilnua  weights  were  used  for  Fons  A. 

Kev  3-  This  key  used  item-analysis  results,  but  •in:t  weights  were 
C.P-. Ir.ned  In  accordance  with  the  l^aglcal  rolaticns  of  the  various  response 
positions.  Blocks  wore  solocced  for  scoring  in  the  same  manner  as  for 
■'oy  . . Opposite  reoponaos  for  the  sami  statement  took  opposite  weights, 
'inlcjs  noth  altemativou  had  Davis  Indices  between  +<3  a.nd  -6.  In  the 
latter  sane,  both  renjionso  p sltlons  were  weighted  zero.  For  Form  A, 

Keys  1 ani.1  j were  identical.  Since  In  thin  foiTi  there  wore  only  two  al- 
torTiatlves  par  blosL , tlie  weighting  of  one  ultoniatlvo  positively  re- 
quired aseign^’nt  of  a neg.atlve  weight  to  the  other. 

rCey  1.  This  key  war  developed  In  the  name  .maranor  an  Key  2,  but 
unou  Group  II  data. 

toy  5-  Thin  key  viis  developed  Ln  the  name  manner  an  Key  3,  using 
■h'oup  II  data.  For  Form  A , Keys  t and  5 were  Identical. 

In  the  construction  of  the  four  , keys  baaed  on  Item-analysis  data, 

It  was  foiund  that  some  blocks  on  each  of  the  six  forms  did  not  yield 
scores  that  differentiated  signlf Ic.antlj'  betw'een  the  high  and  low  groups. 
Tills  would  seem  to  Indicate  that  the  discrlmimtlvo  power  of  a statement 
may  be  different  when  considered  alone  than  when  considered  In  compari- 
son with  certain  other  statements.  Or  It  my  be  that  the  discrimination 
index,  as  detonalned  here,  gives  only  a very  rough  indication  of  the  re- 
lative discriminating  power  of  the  Rtatomenbs . 

Table  2 shows  the  extent  of  shrinkage  of  each  form  when  non- 
disc  r!mlmtlng  blocks  are  eliminated  after  analysis  of  Group  I data. 

TABLE  2 

Ek'FEOT  OF  ITffii  ANALYSIS  ON  LElfGTH  OF  FOBMS^ 


Oi'l+:inal  Length Scorable  Long;th  Per  Cent 


Form 

Blocks 

Gta  torn*  Tit  G 

Blocks 

Statements 

Shrinkage 

A 

72 

i'lni 

32 

bh 

55 

B 

3^1 

.102 

16 

48 

53 

C 

31 

12l| 

26 

104 

16 

C 

31 

L’lf 

20 

80 

35 

E 

32 

12a 

■0 

120 

6 

F 

36 

l«0 

170 

6 

From  those  data  it  would  appear  that,  if  two-  or  throe -choice  blocks  are 
to  bo  used,  consldorablo  shrinkage  should  bo  anticipated.  Whether  it  is 
necessary  to  cor'^ect  for  this  by  starting  with  longer  experimental  forms 


'^Based  on  Group  1 data  and  Key  2. 
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ohouid  be  determlmblo  from  the  relatlonahlpa  of  the  niunber  of  etatemante 
In  a form  to  the  fona'a  coefflclenta  of  ralldlty  and  reliability.  These 
relationships  are  oxploroxi  below. 

Vallditlee.  As  mentioned  earlier,  the  criterion  in  this  study  con- 
sists of  rankings  by  instructor  supervisors  of  their  subordinates'  over- 
all cougjetenco  au  Inatructore.  It  Is  recognized  tiiat  such  a criterion 
Is  not  necessarily  a valid  measure  of  tlio  actual  effectiveness  of  Instruc- 
tors. A more  Ta.lld  measure  vo'uld  probably  be  btisod  on  the  relative  effec- 
tiveness of  Instructors  In  producing  changes  in  the  behavior  and  attitudes 
of  students.  Such  an  viltiinato  criterion  is  particularly  difficult  to  ob- 
tain in  a training  situation  ouch  as  thfit  In  Air  Force  technical  schools, 
since  the  niai\y  different  courses  presazit  an  extremely  wide  range  of  dif- 
ficulty, and  the  teaching  of  each  instructor  is  commonly  limited  to  a 
email  phase  of  one  course. 

The  principal  reason  for  the  present  pro.loct  was  nhe  need  for  a 
rating  form  or  forma  that  would  accurately  report,  for  administrative 
xises,  what  supervisors  believed  to  be  the  relative  effectiveness  of  their 
Instructors.  In  terms  of  this  limited  objective  the  use  of  rankings  as 
the  criterion  la  considered  Justified, 

Table  3 presents  the  cornzlation  coeff icioizts  which  were  obtained 
between  the  criterion  and  scores  from  eacyi  of  the  five  keys  on  each  of 
the  six  forms  for  the  two  groups  of  data. 

An  eismlnatlon  of  these  data  reveals  a rather  zmusual  situation. 

It  would  be  expected  that  the  keys  developed  on  Group  T would  produce 
higher  validities  wlien  they  were  used  on  Group  I than  they  would  produce 
when  they  were  used  on  Group  II.  This  expectation  Is  not  borne  out  by 
the  data.  Of  the  17  correlations  obtained  using  Groun  I keys  (Keys  1, 

2,  3)  on  Group  II  data,  ‘'.11  but  oaie  of  the  comptirable  correlations  are 
higher  for  Group  II  than  they  are  for  Group  1.  Tlio  situation  is  decl- 
fllvolj  reversed  when  Group  II  keys  (Keys  4,  ‘5)  are  used  on  Group  I data. 

In  this  cane,  all  of  the  correlations  are  It-wer  for  Group  I than  for 
Group  II,  1.6.,  are  lower  on  cross-vallc  ^ion. 

This  presence  of  consistently  higher  validities  for  Group  II  than 
for  Group  I casts  doubt  on  the  Group  I criterion  data.  An  examination 
of  the  criterion  for  both  groups  reveals  that  Group  1 contains  a pro- 
domlnanco  of  snssill  gixjups.  Tliafc  la,  any  of  the  groups  which  the  super- 
visors in  Group  I iank<xl  contained  four,  five,  or  six  instructors.  With 
the  Group  II  data,  the  average  size  of  the  gj’oupa  rani  d was  consider- 
ably larger.  As  discussed  on  page  20,  when  rankliitis  are  converted  to 
normalized  stanlard  scores,  the  rosaltlng  scores  may  misrepresent  the  rel- 
ative ablution  of  instructors  who  come  frcttB  groups  that  differ  In  mean 
ability.  The  onvil.Ier  the  groups,  the  more  probable  H la  that  such  mls- 
roprosontatlou  will  occ\u . It  Is  connl  -rod,  therefore,  that  cross- 
validation  data  obUzlnod  when  Keys  1,  2,  and  5 are  used  on  Group  II  may 
be  a better  Indication  of  the  relative  .altdltloo  of  the  six  experimental 
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VALIDITIES 

CF  FORCl®. 

-CHOICE 

FOIMS  USING 

VARIOUS 

KEYS 

(Correlationo  with  rankin^^a 

) 

Group 

I 

KEY 

Form 

i* 

2* 

3* 

4 

5 

A 

.486 

.609 

.501 

•>  — 

B 

.551 

.609 

.595 

.553 

.502 

C 

.44? 

.573 

.562 

.451 

.464 

D 

.523 

.619 

.602 

.565 

.545 

E 

.546 

.567 

.573 

.535 

.548 

E 

.492 

.547 

.537 

.450 

.494 

Group 

II 

KEY 

Form 

1 

3 

Ti 

5* 

A 

.5^0 

.636 

.670 

.. 

B 

.475 

.577 

.525 

.681 

.624 

C 

.655 

.703 

.704 

.714 

.674 

D 

.643 

.680 

.663 

.712 

.708 

E 

.584 

.537 

.549 

.620 

.616 

F 

.583 

.564 

.664 

.609 

n. 

Keyj  2 and  3 wero  derlvod  on  3.ro\ip  I dnt,.v.,  Ko.yt;  It  and  ^ on  Group  II 
datti.  Except  for  bctnc:  Tvitjotl  on  dlfforout  uamplea,  Kf.vu  2 and  U are  com- 
parable, as  aro  Keys  3 •'•ixi  5*  The  astitrloK  in  certain  colimin  headings 
denotes  that  the  corn.  latlons  listed  wore  based  on  the  some  aan^ilo  usnd 
to  derive  the  key  In  question  and  aro  therefore  to  some  degree  spurious. 
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formn  than  the  cioea -validation  datii  roault'nt;  from  Keys  1+  aril  5 on 
Group  I. 

It  should  he  nottid,  however,  that  Ktyo  1 ar<d  ' are  derived  ox- 
cluBlvely  from  Group  I and,  therefore,  m.ay  h expected  to  suffer  from  the 
limitations  of  this  -roup.  It  may  he  assimitja  tint  the  deficiencies  exist 
iri^  in  Group  I are  11!%  dy  to  affect  all  six  oiperlnenttil  forms  in  the 
same  way  and  hence  will  not  alter  the  rolatlvo  order  of  the  validities 
which  would  result  if  ..lost  'f  the  deflcJenc'.eo  did  not  exist.  It  seems 
odvleahle,  then,  to  consider  all  the  cross-validation  data  In  evaluating 
the  various  experlmtintal  "orms.  These  data  are  shown  in  Table  k.  In- 
spection of  Table  I yield;  I'ttlo  evlden- o that  the  relative  validity  of 
the  various  keys  differed  narkediy  over  the  six  foma.  Therefore,  aver- 
aging of  the  dat;i  by  key.j  coiinot  be  coiieidored  to  obscv.re  '’key  by  form" 
interactions . 

As  validity  Is  only  ouo  of  the  important  chamctor! sties  of  a forced 
choice  mtlng  plan  aixi  the  validity  \mder  opemtioiu'l  conditions  re- 
mains to  be  evalmted,  this  Is  not  to  bo  conaiaered  a final  ranking  of 
the  relative  value  of  the  forms.  It  is  interesting  to  note  that  the  two 
forms  containing  all  favorable  statements  in  each  block,  Forms  C and  D, 
show  consistently  higher  croos-validatior  coeff  iclent;.'  than  any'  ether 
forms . 

The  validities  of  all  forms  exceed  a correlation  of  .h04  obtained 
between  a sample  of  442  scores  from  the  graphic  ratirv,  form  previously' 
used  to  rate  instructors  and  supervisor  rankings.  It  may  be  noted  also 
that  the  magnitudes  of  the  average  validity  coefficients  do  not  appear 
to  be  primarily  determined  by  the  lengths  of  the  various  forms. 

It  is  planned  that  Keys  2,  3,  4,  and  will  be  evaluated  later  in 
this  study,  after  data  are  collected  under  operating]  ccrd.ltlone . Key  1 
will  probably  bo  dropped  since  it  is  considered  inferior  to  the  others, 
having  boon  developed  prior  to  the  item  aiwly'sls  and  having  been  derived 
from  the  discrimination  indices  for  Mso  statements  taken  singly  (not  in 
blocks)  and  based  on  only  part  of  the  Gr  rip  J criterion  data  (those  from 
Chanute  AFB).  For  parts  of  the  bt.lanca  ^his  study  it  was  necessary' 
to  choose  one  key.  Any  one  of  tlie  four  keys  (2,  i,  4,  or  i?)  might  prop- 
erly have  been  chosen.  I’roforenco  for  a completely  empirical  key  nar- 
rowed the  choice  to  either  Key  2 or  Key  4.  Key  2 was  available  earlier 
In  the  study  and  j tne  a.’Vtlysls  h‘id  alroaiij  been  completed  with  It. 

Hence,  when  only  me  .<ey  is  used  in  the  b".liince  of  th'.;i  bulletin  it  la 
Key  2 rather  than  Key  4 . 
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‘^TJalng  Keys  1,  2,  3 on  Group  II  and  Keys  it  and  5 on  Group  I. 


ThtJ  diutribuLion  ;! butt'  Lie;)  I’oj'  L}io  i}ix  foi":;.'  '.uxit  j-  both  oxporlinonttil 
and  blaa  coiiditlonu  arc-  proaentod  in  Tablo 

Thu  upwaid  oT  thu  moan  id'orn  uX])url:iiGnt.>\.i  to  bias  coiidltiOna 

can  bo  taken  as  one  meaauro  of  the  extent  to  which  raters  have  succeeded 
in  biasing  the  scores,  in  order  t'S  compare  the  magnitudes  of  these  shifts, 
the  differences  between  the  means  for  Grouj>  II  oii  Key  ? and  the  biased 
im.ans  on  Key  wore  divided  by  the  atfiixia;'.!  dovlatlonu  of  the  Group  II 
distributions,  'lliis  jvave  thu  follow in^^  lankin^v  of  fonm^  in  order  of  de- 
creasing resistance  to  bias: 


^*8  las  ~ ‘^rp  II 
yORM  II 

C .14 

B .60 

A .71 

D ,74 

K-  . 74 

F .74 


A clearer  picture  of  thu  over-all  effects  of  Piaslng  instructions 
on  the  distributions  of  scores  from  the  various  foms  can  be  gained  fretn 
FlguiXiS  7 through  10.  Foims  E arid  F uro  clearly  less  bias-roslatant 
than  the  other  f orirr . The  bias  dlstrlbiitfon  for  Fom  E is  quite  similar 
to  the  distribution  refiortea  by  Gisson  for  the  sjcne  k'nd  of  form  under 
conditions  of  actual  use  (Figure  l).  Forms  E and  F ai’e  tlie  only  forms 
of  thu  six  in  which  favorable  ariit  uafavonbJe  sbutotTwnts  are  Included  in 
the  same  block.  Tills  probably  Increased  the  biasablllty  of  the  form  be- 
cauca  the  ntoi’  is  almost  certain  that  "moat"  answers  to  favorable  state - 
nionts  and  "loa.st"  answers  to  unfavorable  sts.toinent3  are  likely  to  increase 
the  total  score.  Fonmi  witli  unmixed  bloc!.s  do  not  offer-  such  information 
to  the  rater. 

Fona  C,  whlcli  was  she  va  ti)  have  the  . irmil.oc’t  shift  of  moan  score 
’uider  instructions  to  biu.i  anu  was  one  of  the  t-wo  highest  in  validity, 
also  yields  the  most  nonnal  appearing  distribution  of  scores  luxier  those 
conditions. 

It  ooomf  probable  that  the  bias  obt-ilned  in  this  experiment  is  maxi- 
mum and  that  the  actu/xl  >uao\mt  of  effort  to  bias  that  wo'old  be  exerted 
when  a form  wiis  in  regular  use  would  bo  s-mowhat  less.  The  distributions 
that  co-old  thorofoiT)  bo  oxpectod  when  those  formti  were  put  into  use  should 
be  somewhere  between  those  obtained  here  untior  regular  experimental  con- 
ditions and  those  obtained  under  instructions  to  bias. 

Reliability.  Reliability  cooff Iclonts  for  the  six  forms  are  presented 
in  Table  b.  The  procedur'on  ustKi  In  their  compuUitlon  differ  scinowhat  from 
th-'  conventional  (xid-even  m-thod. 

vt  conflnuos  on  ivige  4l) 


tabu:  5 


i*U.^ 


DISTRIBOTION  STATISTICG  FOR  SIX  FORCED -CHOICE  FORMS  TJTIDER  EXPERIMENTAL 
CONDrnONS  and  under  directions  to  attempt  TO  GIVE  AS  HIGH  A 
SCORE  AS  POSSIBLE  (BIAS) 


Oroiip  II 


N Moon  SD 


Bias 


N Mean 


SD 


X-  a 

Group  I 

Form 

N 

Moan 

SD 

A 

1 

163 

36.8 

9.4 

A 

2 

163 

16.7 

6.2 

A 

k 

163 

23.2 

8.1 

B 

1 

181 

34.3 

7.8 

B 

2 

l3l 

2.4 

10.7 

3 

3 

181 

17.7 

5.8 

B 

h 

18]. 

2.4 

IS.  2 

B 

5 

181 

2‘j.l 

^2 

C 

1 

138 

32.5 

8.2 

C 

O 

138 

-1.5 

13.1 

C 

3 

138 

18.4 

6.9 

C 

4 

138 

0.2 

10.3 

C 

5 

138 

17. 

9.4 

D 

1 

173 

2.4 

10.6 

D 

2 

173 

0.8 

14.0 

D 

3 

173 

21.5 

7.7 

D 

4 

173 

2.1 

19.0 

D 

5 

173 

?7.7 

8.1 

E 

1 

168 

21.6 

17.1 

£ 

2 

168 

10.9 

nr,  0 
. c. 

E 

3 

168 

20.0 

20.3 

E 

4 

168 

13.9 

24.9 

E 

5 

168 

15.6 

20.1 

F 

1 

16^ 

20.4 

22.0 

F 

2 

165 

16.9 

29.1 

F 

3 

165 

16.0 

20.2 

F 

h 

165 

19.2 

29.4 

F 

165 

20.5 

23.0 

127 

36.4 

10.8 

90 

42.9 

6.6 

127 

17.1 

6.2 

90 

21.9 

4.0 

127 

22.9 

9.4 

90 

28.4 

9.1 

133 

34.1 

9.3 

94 

39.6 

9.2 

133 

1.2 

11.1 

91^ 

7.9 

7.0 

133 

17.3 

6.2 

94 

21.0 

3.8 

133 

2.1 

19.8 

94 

11.9 

9.9 

133 

24.9 

8.6 

94 

29.8 

9.3 

329 

33.3 

8.9 

38 

36.2 

9.7 

129 

-3.9 

13.2 

.^38 

4.3 

8.9 

129 

18.1 

6.7 

88 

21.4 

4.1 

129 

0.3 

12.7 

88 

9.9 

6.9 

129 

17.5 

7.2 

88 

20.9 

3.7 

129 

2.2 

10.0 

93 

9.3 

6.3 

129 

-0.2 

13.6 

93 

9.8 

9.7 

129 

21.1 

7.6 

93 

27.1 

9.2 

129 

0.9 

17.4 

93 

13.3 

11.3 

129 

27.2 

9.0 

93 

3'+.3 

6.2 

132 

18.8 

23.2 

93 

38.4 

13.4 

132 

8.9 

26.7 

93 

28.3 

18.0 

132 

16.9 

22.1 

93 

30.3 

13.2 

132 

12.4 

26. b 

93 

31.1 

16.7 

132 

18.7 

23.6 

93 

33.0 

13.7 

128 

18. : 

24.  Q 

>9 

36.2 

11.9 

128 

1 ..8 

30.0 

^>9 

37.9 

13.6 

128 

19.3 

21.7 

29 

30.3 

9.2 

128 

19.8 

39.0 

09 

42.9 

19.7 

128 

19.3 

29.6 

99 

36.6 

11.4 
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Tlio  followin^^  datfi.  woi\j  \uud  ia  coiaputia^  roiiubllitiea : 


4 

;i 

I 


(1)  Group  I /Old  Group  II  aiujwar  .uliootu  woro  c mlDlnad  and  scored 
with  Key  P. 

(2)  The  Blau  Group  •■■•ub  ocorod  with  Key  2. 

h.  The  resultB  of  the  earlier  item  analy'sis  of  the  oii  instructor 
description  forma  wore  uaod  In  Bpllttiiii-;  the  blocks  of  tjach  form  into 
two  comparable  yroupa  as  follows; 

(1)  A validity  index  was  obtained  for  each  block  in  a given 
fomi. 

(2)  The  blocks  were  ai’ranged  in  rank  order  with  respect  to  this 
validity  index. 

(3)  An  odd-even  split  of  the  blocks  was  then  made  from  this 
rank  Older. 

(4)  Minor  nd.lustmenta  in  the  oplit  were  made  to  balance  on  the 
criterion  of  number  of  choices  scored  per  block. 

c.  Coefflcionta  of  correlation  were  obtained  between  the  odd  and 
even  blocks  for  each  fonn. 

It  is  evident  from  Table  6 that  the  three  forms  with  less  than  100 
eta lements  each  (A,B,I))  have  the  lover  coefficients  (Cola,  3,*^)*  When  a 
correction  is  made  for  the  length  of  the  forms,  all  coefficients  reach 
aatlafactory  levels  (Cols.  6,7).  It  would  therefore  seam  to  he  advisable, 
whan  constructing  forced -choice  forms  made  up  of  2-  or  3*cholce  blocks, 
to  include  in  the  experimental  forms  enough  extra  statomente  to  coapen- 
sate  for  the  excessive  shrinkage  that  takes  place  with  forms  so  constructed. 

Vfhen  forced-choice  formu  are  conutructed  in  the  iminners  reported  here, 
using  discrimination  differences  arountl  .60,  it  appears  that  flrjal  forms 
about  100-120  etatnmoatti  in  length  should  yield  relJahllity  coefficients 
around  . ^0 . 

The  araallor  coefficients  shown  for  the  Bias  data  (Col.  5) 
countod  for  by  the  shrinkage  of  the  variance  of  the  distributions  of  scores 
under  instructions  to  bias.  'When  cori’octad  for  the  difference  in  variance 
these  coefficients  are  not  olg.nlf ioantl^;  different  from  those  obtained  for 
Group  II  (Col.  4). 

More  meaningful  kiixls  of  rollubility  than  that  presented  above  would 
bo  the  agroemoiit  between  uiffuront  Intel’s  using  the  seii.e  form,  different 
raters  using  different  fonas!,  ajid  the  sajno  rater  using  different  forms. 

The  fli'st  two  of  those  are  not  obtaltvxh.l  ) in  the  techiiloal  Instructor 
sltufitlon,  since  In  tho  great  majority  of  ./roups  thorc  la  but  one  immediate 
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jupervlaor.  Data  oii  the  Inter-form  reliability  uoln^  the  aame  raters 
are  being  collected  under  operatloiuvl  conditions  and  will  be  reported 
later. 

Supeivlaors ' Ratings  of  the  Relative  Dos Inability  of  the  Yarlous 
Forms.  A number  of  supervlaors  who  participated  In  the  forced-choice 
orperlm-int  ware  ashed  to  state  which  of  the  six  foims  (or  however  many 
forms  they  ueod  In  ratln,?  their  group  of  Instnictors)  they  liked  beet, 
next  boat,  etc.  if  a uuporvlaor  rated  six  or  more  Instructors,  he  made 
use  of  all  six  of  the  rating  forms.  If  ho  had  less  than  six  Instructors, 
he  used  as  many  of  the  fonns  as  he  had  Instnjctors . In  other  words, 
each  Instructor  was  rated  on  only  one  fonu,  but  as  miniy'  of  the  forms  as 
possible  were  used  by  each  supervisor  In  mtlng  his  Inoti-uctors.  This 
resulted  in  from  60  to  8^  rankings  of  each  form. 

The  mean  r>nktj  in  tho  ■''tdei’  of  most  dojirable  to.  loast  deslrablo 
for  each  of  the  six  form.;  wero; 

] . F am  C '^.8.' 

P , Form  A 2.90 
'•! , Fom  F 3 • 1'^ 

•'i.  Form  E 
9.  Fora  D 3-^9 
Foi'ir.  B 4.66 

It  should  be  roco(.pilzed  that  these  data  reflect  only  tho  ranked  desira- 
hilJ by  of  the  forced-choice  forms  prosentoi.  Such  a rnnklng  cannot  show 
lnteni3lty  of  feoiln^.’,  l.o.,  It  Is  conceivable  that  two  forms  having  ad- 
jacent ranks  might  bo  widely  sopai-ated  on  a llkf-d IsllKe  continuum.  • It 
is  also  possible  that  the  niters  niiy  have  very  much  liked  or  very  much 
disliked,  all  forma. 

The  moat  useful  conclusion  from  the  daha  would  oeom  to  be  that  if 
forcod -choice  forms  ai-e  to  bo  used,  thou  foi”ii:!  arixuv^od  a.i  are  Forma  C 
and  A are  .aomowhat  leas  liliely  to  bo  d l.il kea  than  rn'e  the  others. 


oIIM'-'mRY  AHD  GONCLTolOhd 

1.  I.a  th!  ; study,  ’ i/iterhenL:;  ibi  a:,  : iri.-us  aspects  of  tho  per- 

forrianco  of  Air  Force  tocln.ical  .a-hool  In.- ‘ n;c toi’.;  were  -’llocted. 

Preference,  favomhione.  , am  i!  -imin.i t i on  ImIcos  of  each 
staterent  were  computed. 

,1.  olx  .^.omental  foi-ced-'-h<;!  cu.- toi'  cit. J n-;  forma  wore 

c.'aatnicted . Th-  ae  differed  e-ich  ■ th-.-.”  In  n'i.cb-.;r  of  statenionts 

per  block,  in  horn  > .amelty  oi‘  .'1  ick.!  w^th  CM^.-.-ir-d  to  favonblonoss  of 
st'itements,  --r  iii  dlro-tl-sn-.  to  tlio  nit.-. 


v 


U.  The  fons-i  were  uaed  by  Instnactor  oupor-zlGon:  to  rate  Instruc- 
tors at  six  Air  Force  bJisoa.  Additicnc.l  cupor.’icioru  at  two  bases  coia- 
pletod  the  forras  •u;ider  in.ctructiona  to  give  as  hi.-;h  a rritlng  as  possible 


5.  From  the  data  available,  five  experimental  sccrin^  keys  were 


developed 


6.  The  liistructor  supervisors  wore  asked  to  rank  the  instructors 
they  super/lsed  accoidinf;  to  over-all  competence  as  an  instructor.  Cor- 
relations between  theoc  rnnks  and  scores  on  the  rating  forms  were  separ- 
ately computed  for  the  six  bases,  six  fciias,  and  the  five  keys.  The  '^6 
coefficients  obtained  z'angod  from  .4Ul  to  .71*^-  Under  conditions  of 
cross-validation.  Forms  C euid  D (4-item  blocks,  all  favorable  statements) 
had  the  highest  average  validity. 


7.  When  supervisors  were  instnjeted  to  give  as  hl.;.5h  a rating  as 
possible  and  when  blasability  was  esti-vitod  in  tenna  of  the  resulting 
moan  shift.  Form  C proved  least  biusablo,  wlt'ii  Fum;  E 'ind  F being 
markedly  less  resistant  to  efforts  to  bias  than  the  other  four  forms. 


8.  Reliability  coefficients  for  each  foim  were  computed  by  a modi 
flcatlon  of  the  'xld-evon  method.  Tliese  ringed  from  .657  to  .959.  When 
corrected  for  differences  in  lengths  of  foims,  the  coefficients  ranged 
frc«n  .908  to  .966. 


9.  Supervisors  were  asked  to  rank  the  six  forms  as  to  desirability 
Forms  C and  A were  about  equally  preferred  over  the  others. 


10.  It  is  c included  tfv.t,  of  the  six  forms  tested  here.  Form  C 
yields  the  best  ovor-nll  results,  since  it  was  one  of  the  two  highest 
In  validity,  was  lowest  in  blasability,  had  satisfactory  reliability, 
and  was  one  of  the  two  fonns  best  liked,  or  least  disliked,  by  the 
raters . 


11.  The  conclusions  of  the  study  are  limited  by  the  fact  that  the 
data  were  collected  in  an  experimental  situation.  Forma  A,  B,  C,  and  D 
are  now  in  use  for  the  regular  rating  o ’ ir  Force  technical  school  in- 
etructora.  The  resulting  data  are  to  bo  compared  with  those  presented 
here. 
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